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m Real Party in Interest 

The real party in interest in this Appeal is E. I. du Pont de Nemours 
and Company, the assignee of the entire right, title and interest of the above- 
identified patent application. 

(in Related Appeals and Interferences 

There are no related Appeals or Interferences which will directly 
affect or be directly affected by or have a bearing on the Board's decision in the 
pending Appeal. 

flin status of Claims 

Claim 1-20 were originally filed. 

Claims 21-38 were added during prosecution and then claims 1-38 
were cancelled. Claims 39-53 were added during prosecution and were rejected. 
There are three independent claims: 39, 44 and 49. 

The currently pending and appealed claims are claims 39-53 which 
are set forth in the Claims Appendix attached hereto. 

fIV) Status of Amendments Filed Subsequent to Final Rejection 

A Response After Final was filed electronically on February 4, 2008. 
Claim 44 was the only claim amended. The Response After Final was entered 
as set forth in the Advisory Action dated March 5, 2008. 

m Summary of the Invention 

Lysine-l<etoglutarate reductase (LKR) and saccharopine 
dehydrogenase (SDH) catalyze the first and second steps, respectively, in the 
breal<down pathway of lysine resulting in the product of saccharopine or alpha- 
amino adipic acid. Thus, the ability to down-regulate expression of the LKR/SDH 
gene can lead to an increase in the level of lysine in seed by preventing, either 
partially or fully, the breal<down of lysine. 
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Claim 39 of tlie instant invention relates to a cliimeric gene capable of 
causing an increased level of lysine in seeds obtained from a transformed plant, 
ttie chimeric gene comprising: 

a) an isolated nucleic acid fragment comprising a nucleic acid sequence 
whicti is useful in antisense intiibition or sense suppression of endogenous lysine 
ketoglutarate reductase/saccharopine dehydrogenase activity in a plant or plant 
cell wherein said isolated nucleic acid fragment comprises all or a part of the 
nucleic acid sequence encoding a plant lysine ketoglutarate 
reductase/saccharopine dehydrogenase, said part being sufficient in length for 
use in antisense inhibition or sense suppression; and 

b) at least one regulatory sequence operably linked to said fragment. 
Also claimed are plants transformed with this chimeric gene, seeds 

obtained from such transformed plants and a method for increasing the lysine 
content in a plant seed using this chimeric gene. 

This is discussed in the specification on page 31 at line 30 through the last 
line on page 37 and in Example 20 on pages 92-98. 

Claim 44 is virtually identical to claim 39 with the exception that the 
transformed plant is a corn plant. 

This is discussed in the specification on page 31 at line 30 through the last 
line on page 37 and in Example 20 on pages 92-98. 

Claim 49 is similar to claim 44 in that the transformed plant is a corn plant. 
Claim 49 recites that the isolated nucleic acid fragment comprises all of a part of 
the nucleic acid fragment of SEQ ID NO:120 which is the sequence of a 3,265 
nucleotide cDNAfrom corn. 

This is discussed in the specification on page 8, line 24, page 31 at line 30 
through the last line on page 37 and in Example 20 on pages 92-98. 



(Vl) Grounds of Rejection To Be Reviewed on Appeal 



There are two grounds of rejection presented for review: 
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(a) Whether claims 39-53 comply with the written description requirement 
under 35 USC §112, first paragraph, in view of the following: 

(i) diagrams (sequence alignments) that were not part of the 
specification but contain sequence(s) that are disclosed in the 
specification; 

(ii) two post-filing date publications (in view of the priority claimed) 
that were not available at the time of the invention (one of which was co- 
authored by the above-identified co-inventors) but provide infonnation 
about the sequences disclosed and claimed in the instant application; and 

(ill) two Declarations of Dr. Carl Faico, one of the co-inventors of 
the subject application purportedly because only one example of a 
sequence that functions is provided since "one example of a sequence is 
not sufficient to support the claimed genus of any nucleic acid sequence 
which is useful in inhibition of LKR/SDH activity in a plant or plant cell...." 

(b) Whether claims 39-53 comply with the enablement requirement under 
35 USC §112, first paragraph, in view of the two Declarations of Dr. Carl FaIco 
and sequence alignment, sequence alignments and two post-filing date 
publications, one of which was co-authored by Dr. FaIco and Dr. Epelbaum, the 
co-inventors of the subject application. 

fVin Argument 

fa) The refection of claims 39-53 under 35 USC S112. first paragraph, 
as failing to comply with the written description reguirement. 

Drs. FaIco and Epelbaum, the co-inventors of the claimed invention, were 
the first to report the molecular cloning of a plant LKR/SDH genomic and cDNA 
sequence. They subsequently co-authored a paper (Epelbaum et al., Plant 
Molecular Biology 35:735-748 (1997)) that was published subsequent to the filing 
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of the above-identified application. A copy of this paper was previously submitted 
and is attached hereto as Evidence Appendix A. 

Epelbaum et al. and Example 20 on page 94, describes the isolation of the 
gene encoding LKR/SDH from an Arabidopsis thaliana genomic DNA library 
based on the homology between the yeast biosynthetic genes encoding SDH 
(lysine-forming) or SDH (glutamate- forming) and Arabidopsis expressed 
sequence tags. 

Primers were designed from these expressed sequence tags (ESTs) 
(page 736 of the paper under "Materials and Methods", section "Gene Isolation"). 
The sequences of these ESTs, T13618 and T45802, correspond to SEQ ID NOs: 
102 and 103, respectively, of the instant specification (page 32, 3"^ paragraph 
and example 20, page 94, last paragraph). 

The sequences of ESTs served as the basis for designing primers (SEQ 
ID NOs:108 and 109) for use in the PGR amplification of a 2.24kb DNA fragment 
form genomic Arabidopsis DNA (specification, page 32 , third paragraph, and 
page 95, first paragraph, and Epelbaum et al., page 736 "Materials and Methods" 
section named "Gene isolation). 

The 2.24 kb DNA fragment was then used to isolate a larger genomic DNA 
fragment. The sequence of this larger genomic fragment is provided in SEQ ID 
N0:1 1 0 of the specification and corresponds to the nucleotide sequences shown 
in Figure 2 of the Epelbaum et al. paper. Subsequently the full length DNA coding 
sequence for the Arabidopsis LKR/SDH was isolated via RT-PCR. The sequence 
of the Arabidopsis LKR/SDH cDNA is provided in SEQ ID N0:111 of the instant 
specification and is indicated by capital letters in the nucleotide sequence in Fig.2 
of the paper. The deduced amino acid sequence of the Arabidopsis LKR/SDH 
protein is shown in SEQ ID N0:1 12 and corresponds to the amino acid sequence 
shown in Fig.2 of the Epelbaum et al. paper. 

The deduced amino acid sequence set forth in Figure 2 on pages 739-741 
of Epelbaum et al. shows that in plants the LKR/SDH activities are carried on a 
single bi-functional protein. Function of the Arabidopsis LKR/SDH protein can be 



Application No.: 10/804678 
Docket No.: BB1037USCNT 



Page 7 



assayed using previously described assays with some minor modifications (Page 
738 of paper under "LKR specific activity", "SDH specific activity"). Tfie "SDH 
portion" (SEQ ID N0:131 of tlie instant specification) of tiie bi-functional 
Arabidospis LKR/SDH protein could be successfully expressed and assayed in 
E.coli. 

Accordingly, what is discussed in the Epelbaum et al. paper relates 
directly to the sequence and subject matter of the instant application. Even 
though this paper was published after the priority of the instant application, it 
simply further discusses the sequence already disclosed and claimed in the 
instant application. 

Figure 4 on page 744 of the Epelbaum paper sets forth a comparison of 
the deduced amino acid sequences of three fungal genes encoding SDH (lysine 
forming) with the A. thaliana LKR. 

Figure 5 is a comparison of the deduced amino acid sequence of the S. 
cerevisiae SDH (glutamate forming) and the A. thaliana SDH. The Arabidopsis 
sequences used in these comparisons are the same Arabidopsis sequences 
disclosed in the instant application. In fact, the comparison in Figure 5 of 
Epelbaum is similar to the comparison in Figure 9 of the instant application. 
Figure 9 is described on page 10 at lines 1-2 of the instant specification as 
showing "the amino acid similarity between the polypeptides encoded by two 
plant cDNAs and fungal S. cerevisiae (glutamate forming)." Figure 9 is also 
discussed in Example 20 on page 95 at lines 1-3 of the instant application. 

Based on comparison of the Arabidopsis LKR and SDH with other LKR 
and SDH proteins, as mentioned above, degenerate primers (SEQ ID N0S:113 
and 114) were designed and additional LKR and SDH sequences from corn and 
soy were Identified and isolated (page 95, last paragraph through page 96 first 
paragraph of instant application). Subsequently, near full length soy and corn 
LKR/SDH sequences were obtained. The comparison of the corn and soy 
LKR/SDH sequences with ESTs from other plants enabled the identification and 
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isolation of sequences from rice and wheat (described on page 95 at line 33 
through the end of page 96). 

A cosuppression experiment using a modified shorter version (1268bp 
fragment, see below) of the corn LKR/SDH (SEQ ID NO: 120), is discussed in the 
specification starting on page 97 at lines 15-36. 

Dr. Falco's declaration(s) provided additional data showing that the1268 
bp gene fragment include the LKR coding domain obtained from the corn LKR- 
SDH sequence (SEQ ID NO:120) was successfully used in cosuppression 
studies to produce seeds having increased accumulation of lysine. This increase 
in lysine appeared to be directly related to the co-suppression of LKR/SDH. 

Dr. Falco's Declaration dated August 24, 2000 (copy provided in Evidence 
Appendix B) shows that two important elements that are necessary and sufficient 
to practice the invention are provided: (1) the motivation to "knock out" LKR (as 
is set forth in paragraph 4 of Dr. Falco's declaration[,]) and (2) disclosure of the 
first nucleic acid fragments encoding a plant LKR. With these fragments in hand, 
then it was possible to isolate LKR fragments from any other plant desired, and 
use them to block expression utilizing antisense inhibition and/or cosuppression. 
Dr. Falco's declaration demonstrates that blocking the first step in lysine 
catabolism, i.e., "knocking out" LKR, leads to increased accumulation of lysine in 
seeds. 

Dr. Falco's Declaration dated February 16, 2001, (copy provided in 
Evidence Appendix C) one of the co-inventors of the subject case, sets forth 
data showing seeds with increased lysine content that were obtained from plants 
co-transformed with DHDPS and LKR. The LKR sequence, a 1268 bp gene 
fragment of obtained from the sequence comprising the near full length corn 
LKR/SDH (SEQ ID NO:120), was successfully used to increase lysine and 
correlated with cosuppression of LKR/SDH. 

The experiments discussed in Dr. Falco's previously submitted declaration 
taken together with the detailed description of the invention provided in the patent 
application and the previous declaration (dated August 24, 2000), clearly 
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demonstrate that an increased lysine content is achieved when a foreign lysine 
insensitive DHDPS gene (with or without a lysine insensitive AK gene) is 
combined with a foreign co-suppressing LKR gene. 

Another reference that demonstrates that the nucleotide sequences 
described in the invention encode plant lysine-ketoglutarate reductase and 
saccharopine dehydrogenase proteins are Tang et al., Plant Cell 9:1305-1316 
(1997) entitled "Regulation of lysine catabolism through lysine-ketoglutarate 
reductase and saccharopine dehydrogenase in Arabidopsis" (copy provided in 
Evidence Appendix D). 

This paper reports the cloning of an Arabidopsis cDNA encoding a 
bifunctional polypeptide that contains both of these enzymatic activities linked to 
each other. 

The Arabidopsis sequence dislosed by Tang et al. is essentially identical 
to SEQ ID N0:1 1 1 of the instant application. Tang et al. page 1308, right-hand 
column, discloses that bacterial cells transfonned with a plasmid having the LKR 
and SDH insert showed SDH, but no LKR activity. However, yeast cells 
transformed with a plasmid having the LKR insert had significantly higher LKR 
activity than control yeast cells transformed with the same plasmid lacking the 
LKR insert. 

Given that the Arabidopsis sequence disclosed in the instant application, 
SEQ ID N0:1 1 1 , is essentially identical to that disclosed by Tang et al., then it 
would be expected that SEQ ID N0:1 1 1 would also produce LKR activity if 
expressed in yeast as described by Tang et al. 

Structural and functional properties of the bifunctional LKR/SDH enzyme 
are discussed in the Tang et al. paper, starting on page 1312, left hand column. 
This is the same enzyme disclosed in the instant application. 

Analysis of LKR and SDH activities is described on page 1315, left hand 
column, and it should be clear to those skilled in the art that such analysis is 
clearly within the skill in the art. 
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It is respectfully submitted that in view of Epelbaum et al. and Tang et al., 
it should be clear that there is a correlation between sequence similarity and 
functionality insofar as LKR/SDH activity is concerned. 

Furthermore, it should be reiterated that the information regarding the full 
length Arabidopsis LKR/SDH nucleotide and amino acid sequences, the 
expression and assay of the SDH portion of the Arabidopsis LKR/SDH, the soy 
and corn partial LKR/SDH sequences and the use of LKR for cosuppression 
experiments was available as of March 27, 1997 in priority application having 
application no.: 08/824,6127. 

Accordingly, it is respectfully submitted that the claims fully comply with 
the written description requirement of 35 USC §112, first paragraph. 

lb) The rejection of claims 39-53 under 35 USC S112. first 
paragraph, as failing to comolv with the enablement requirement. 

It is believed that all of the foregoing discussion, references and 
information discussed above with respect to written description rejection, are 
equally apposite with respect to the enablement rejection of claims 39-53 under 
35 USC §112, first paragraph. 

Specifically, it is stated on page 4 of the Office Action mailed on January 
25, 2007 that the "specification does not demonstrate that any of the claimed 
sequences have homology to saccharopine dehydrogenase (SDH) and that SEQ 
ID NO: 120 and 122 are not full length sequences (page 34). Therefore, it is even 
more uncertain that the claimed sequences would encode the portions required 
to confer LKR activity. " 

It was further stated on page 5 of this same Office Action that "De Luca 
teaches that modifying plant biosynthetic pathways by transforming plants with 
genes encoding enzymes involved in a biosynthetic pathway is highly 
unpredictable and often the desirable results are impossible to achieve". 
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It is respectfully submitted that ample information is available in case of 
the lysine biosynthetic and catabolic pathways that clearly demonstrates how to 
increase lysine production via modification of the biosynthetic and catabolic 
pathways. The use of lysine feedback-insensitive versions of the key 
biosynthetic enzymes, DHDPS and AK, has been shown to lead to an increase in 
free lysine levels. The instant specification teaches that blocking the first step in 
lysine catabolism will lead to increased accumulation of lysine. 

This taken together with all of the Information discussed above with 
respect to the written description rejection, the it is respectfully submitted that one 
of ordinary skill in the art would be able to practice the claimed invention without 
engaging in undue experimentation. 

As was discussed above, Dr. Falco's Declaration dated February 16, 
2001, (copy provided in Appendix C) one of the co-inventors of the subject case, 
sets forth data showing seeds with increased lysine content that were obtained 
from plants co-transformed with DHDPS and LKR. The LKR sequence, a 1268 
bp gene fragment of obtained from the sequence comprising the near full length 
corn LKR/SDH (SEQ ID NO: 120), was successfully used to increase lysine and 
correlated with cosuppression of LKR/SDH. 

The experiments discussed in Dr. Falco's previously submitted declaration 
taken together with the detailed description of the invention provided in the patent 
application and the previous declaration (dated August 24, 2000), cleariy 
demonstrate that an Increased lysine content is achieved when a foreign lysine 
insensitive DHDPS gene (with or without a lysine Insensitive AK gene) is 
combined with a foreign co-suppressing LKR gene. 

Reference was made to Doerks (TIG14, no. 6:248-250, June 1998) (copy 
provided in Evidence Appendix E) for the proposition that sequence homology is 
not sufficient to predict function of an encoded sequence.; reference was made to 
Smith et al. (Nature Biotechnology 15:P1222-1223, November 1997) (copy 
provided in Evidence Appendix F) for the proposition that homologuos proteins 
can have different functionality; reference was made to Brenner (TIG 15, 4:132- 
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133, April 1999) (copy provided in Evidence Appendix G) which discusses the 
problem of inferring function from homology and Borks (TIG12, 10:425-427, 
Ovotber 1996) (copy provided in Evidence Appendix H) which teaches problems 
with sequence databases that can result in the misinterpretation of sequence 
data. 

Given what has been discussed herein, it is respectfully submitted that 
there is no such problem with respect to the claimed invention. 

Attached hereto is Evidence Appendix I which is an alignment of the LKR 
domains of the plant bifunctional LKR/SDH proteins from Arabidopsis (SEQ ID 
N0:112), corn (SEQ ID NO:122, encoded by SEQ ID NO:120) and soybean 
(SEQ ID NO: 121) and the monofunctional lysine-forming SDH proteins from 
S.cerevisiae (gi:453184), C.albicans (gi:1 170847) and Y.lipolytica (gi:1 73262). 

Evidence Appendix J (submitted herewith) is comparison of the SDH 
domains of the bifunctional plant LKR/SDH proteins from Arabidopsis (SEQ ID 
N0:112), corn (SEQ ID NO:122, encoded by SEQ ID NO:120) and soybean 
(SEQ ID N0:121) and the monofunctional glutamate-forming SDH protein from 
S.cerevisiae (gi:729968). Residues that are identical among at least one of the 
plant sequences and at least one of the yeast sequences are indicated by an 
asterisk above each alignment. Residues that are identical among at least two 
plant sequences are indicated by a plus sign above each alignment. 

The plant LKR domains share about 70% and 60% sequence identity with 
each other, respectively, whereas the plant LKR domains and yeast lysine- 
forming SDH proteins share between 15% and 17% sequence identity. 

The plant SDH domains share about 60% sequence identity among each 
other and around 30% sequence identity with the yeast protein. Alignments and 
percent identity calculations were perfomned using the Clustal V method of 
alignment. 

The comparisons set forth in Evidence Appendices I and J demonstrate 
that the sequences of the invention possess stretches of highly conserved 
regions. One skilled in the art would appreciate that the more highly conserved a 
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residue is, the less likely that it could be modified and function maintained. From 
these alignments, one could quicl<ly determine which amino acid residues might 
be modified in SEQ ID NO:122 (encoded by SEQ ID NO:120) without a lll<ely 
change In function. 

In the instant specification, the cDNA fragments of the bifunctlonal 
Arabidopsis LKR/SDH were identified based on the homology to the 
monofunctlonal proteins from yeast. The sequence similarity between the yeast 
and plant polypeptides (Fjg.9 of instant specification) demonstrated that these 
cDNAs encode Arabidopsis saccharoplne dehydrogenase. 

The complete genomic sequence of the Arabidopsis LKR/SDH gene was 
subsequently isolated and the cDNA sequence and con'esponding amino acid 
sequence determined. The LKR/SDH cDNA revealed an ORF of 3.16 kb, which 
predicts a protein of 1 17 kd, and confirms that the LKR and SDH enzymes reside 
on one polypeptide. 

In order to isolate further plant LKR/SDH sequences, degenerate primers 
based upon comparison of the Arabidopsis LKR/SDH amino acid sequence with 
that of other LKR proteins were designed. These were used to amplify soybean 
and corn LKR/SDH fragments using PGR from mRNA, or cDNA synthesized from 
mRNA, isolated from developing soybean or corn seeds. Near full length 
sequences for the LKR/SDH sequences were obtained using Race and 
hybridization protocols. Furthemnore, partial rice and wheat were isolated based 
on homology to the Arabidopsis protein. 

Evidence Appendix K (submitted herewith) is an alignment of the plant 
bifunctlonal LKR/SDH proteins from Arabidopsis (SEQ ID NO: 112), com (SEQ ID 
NO:122) and soybean (SEQ ID N0:121). Amino acid residues identical among 
at least two plant sequences are Indicated by an asterisk on the top row; dashes 
are used by the program to maximize the alignment of the sequences. The LKR 
and SDH domains have been boxed In Evidence Appendix K to facilitate review 
of the enclosed Evidence Appendix K. It should also be noted that, in addition to 
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the LKR and SDH domains, a high degree of homology is also observed in the 
intermediary or 'spacer' region of the bifunctional LKR-SDH polypeptide. 



fVII) Conclusion 

When this is viewed in combination with the information presented in 
Epelbaum et al. (discussed above), one is inexorably led to the conclusion that 
one skilled in the art can make and use the claimed invention without engaging in 
undue experimentation. 

Accordingly, the Board is respectfully requested to reverse the final 
rejection of pending claims 39-53 and indicate allowability of all claims. 
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Enclosed herewith Is a Petition for a three (3) month extension of time to 
permit the filing of the Brief on Appeal. Please charge the fee for the extension of 
time of three (3) months, as well as the requisite fee set forth in 37 CFR §1 .17(f), 
to Appellant's Assignee's (E. I. du Pont de Nemours and Company) Deposit 
Account No. 04-1928. 

Respectfully submitted, 
/Lynne M. Christenbury/ 

LYNNE M. CHRISTENBURY 

ATTORNEY FOR APPLICANTS 
Registration No.: 30,971 
Telephone: (302) 992-5481 
Facsimile: (302)892-1026 



Dated: June 13. 2008 
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Claim 39. (previously presented) A chimeric gene capable of causing an 
increased level of lysine in seeds obtained from a transfomied plant, the chimeric 
gene comprising: 

a) an isolated nucleic acid fragment comprising a nucleic acid sequence 
which is useful in antisense inhibition or sense suppression of endogenous lysine 
ketoglutarate reductase/saccharopine dehydrogenase activity in a plant or plant 
cell wherein said isolated nucleic acid fragment comprises all or a part of the 
nucleic acid sequence encoding a plant lysine ketoglutarate 
reductase/saccharopine dehydrogenase, said part being sufficient in length for 
use in antisense inhibition or sense suppression; and 

b) at least one regulatory sequence operably linked to said fragment. 
Claim 40. (previously presented) A plant comprising the chimeric gene of 

Claim 39 in its genome. 

Claim 41 . (previously presented) Seed obtained from the plant of 
Claim 40. 

Claim 42. (previously presented) A method for increasing lysine content 
in a plant seed which comprises: 

(a) transforming plant cells with the chimeric gene of Claim 39; 

(b) regenerating fertile mature plants from the transformed plant cells 
obtained from step (a) under conditions suitable to obtain seeds; 
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(c) screening progeny seed of step (b) for increased lysine content; and 

(d) selecting those lines whose seeds have increased lysine content. 
Claim 43. (previously presented) Seed obtained by the method of 

Claim 42. 

Claim 44. (previously presented) A chimeric gene capable of causing an 
increased level of lysine in seeds obtained from a transformed corn plant, the 
chimeric gene comprising: 

a) an isolated nucleic acid fragment comprising a nucleic acid sequence 
which is useful in antisense inhibition or sense suppression of endogenous lysine 
ketoglutarate reductase/saccharopine dehydrogenase activity in a corn plant or 
corn plant cell wherein said isolated nucleic acid fragment comprises all or a part 
of the nucleic acid sequence encoding a corn plant lysine ketoglutarate 
reductase/ saccharopine dehydrogenase, said part being sufficient in length for 
use in antisense inhibition or sense suppression; and 

b) at least one regulatory sequence operably linked to said fragment. 
Claim 45. (previously presented) A corn plant comprising the chimeric 

gene of Claim 44 in its genome. 

Claim 46. (previously presented) Seed obtained from the corn plant of 
Claim 45. 

Claim 47. (previously presented) A method for increasing lysine content in 
a corn plant seed which comprises: 

(a) transforming corn plant cells with the chimeric gene of Claim 44; 
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(b) regenerating fertile mature plants from the transformed corn plant cells 
obtained from step (a) under conditions suitable to obtain seeds; 

(c) screening progeny seed of step (b) for increased lysine content; and 

(d) selecting those lines whose seeds have increased lysine content. 
Claim 48. (previously presented) Seed obtained by the method of 

Claim 47. 

Claim 49. (previously presented) A chimeric gene capable of causing an 
increased level of lysine in seeds obtained from a transfomned com plant, the 
chimeric gene comprising: 

a) an isolated nucleic acid fragment comprising a nucleic acid sequence 
which is useful in antisense inhibition or sense suppression of endogenous lysine 
ketoglutarate reductase/saccharoplne dehydrogenase activity in a corn plant or 
plant cell wherein said isolated nucleic acid fragment comprises all or a part of 
the nucleic acid sequence of SEQ ID NO: 120, said part being sufficient in length 
for use in antisense inhibition or sense suppression; and 

b) at least one regulatory sequence operably linked to said fragment. 
Claim 50. (previously presented) A plant comprising the chimeric gene of 

Claim 49 in its genome. 

Claim 51 . (previously presented) Seed obtained from the plant of 
Claim 50. 

Claim 52. (previously presented) A method for increasing lysine content in 
a plant seed which comprises: 

(a) transfomiing plant cells with the chimeric gene of Claim 49; 
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(b) regenerating fertile mature plants from the transformed corn plant cells 
obtained from step (a) under conditions suitable to obtain seeds; 

(c) screening progeny seed of step (b) for increased lysine content; and 

(d) selecting those lines whose seeds have increased lysine content. 
Claim 53. (previously presented) Seed obtained by the method of 

Claim 52. 
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Evidence Appendix A 

Epelbaum et al., Plant IVIolecular Biology 35:735-748, 1997 

This reference was entered into the record by the Examiner in the office Action 
mailed September 28, 2007, initialed PTO fomi 1449. 
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Plant Molecular Biology 35: 735-748, 1997. 735 
© 1 997 Kluwer A cademic Publishers. Primed in Belgium. 

Lysine-ketoglutarate reductase and saccharopine dehydrogenase from 
Arabidopsis thaliana: nucleotide sequence and characterization 

Sabine Epelbaum", Raymond McDevitt and S. Carl Falco 

Agricultural Pmducts, E.I. DuPont de Nemours & Co., Wilmington, DE 19880-0402, USA C author for 
correspondence) 

Received 4 April 1997; accepted in revised forni 7 July 1997 



Key words: Arabidopsis thaliana, heterologous expression, lysine catabolism, lysine-ketoglutarate reductase sac- 
charopine dehydrogenase 



Abstract 

We isolated the gene encoding lysine-ketoglutarate reductase (LKR, EC 1.5.1.8) and saccharopine dehydrogenase 
(SDH, ED 1.5.1.9) from an Arabidopsis thaliana genomic DNA library based on the homology between the 
yeast biosynthetic genes encoding SDH (lysine-fomiing) or SDH (glutamate-foming) and Arabidopsis expressed 
sequence tags. A corresponding cDNA was isolated from total Arabidopsis RNA using RT-PCR and 5' and 3' Race. 
DNA sequencing revealed that the gene encodes a biftinctional protein with an amino domain homologous to SDH 
(lysine-forming), thus corresponding to LKR, and a carboxy domain homologous to SDH (glutamate-forming). 
Sequence comparison between the plant gene product and the yeast lysine-forming and glutamate-forming SDHs 
showed 25% and 37% sequence identity, respectively. No intracellular targeting sequence was found at the N- 
terminal or C-terminal of the protein. The gene is interrupted by 24 introns ranging in size from 68 to 352 bp and is 
present in Arabidopsis in a single copy. 5' sequence analysis revealed several conserved promoter sequence motifs, 
but did not reveal sequence homologies to either an Opaque 2 binding site or a Sph box. The 3 '-flanking region does 
not contain a polyadenylation signal resembling the consensus sequence AATAAA. The plant SDH was expressed 
in Escherichia coli and exhibited similar biochemical characteristics to those reported for the purified en2yme from 
maize. This is the first report of the molecular cloning of a plant ZJITi-SD^ genomic and cDNA sequence. 

Abbreviations: AK, aspartate kinase; CTP, chloroplast transit peptide; DHDPS, dihydrodipicolinate synthase; 
LKR, lysine-ketoglutarate reductase; SDH, saccharopine dehydrogenase. 



Introduction 

Lysine is synthesized in higher plants and in many bac- 
terial species from aspartate [6, 8]. Its rate of synthesis 
in plants is regulated mainly by feedback inhibition 
of aspartate kinase (AK) and dihydrodipicolinate syn- 
thase (DHPS) [6]. These enzymes therefore play an 
important role in determining the level of free lysine. 
Control of the biosynthetic pathway to lysine is of spe- 
cial interest, since lysine levels are low in the seeds of 

The nucleotide sequence data reported will ^>pear in the Gen- 
Bank, EMBL and DDBJ Nucleotide Sequence Databases under die 
accession numbers U9575g (A. thaliana (Landsberg erecid) LKR- 
SDH gene) and U95759 (A. thaliana (Columbia) LKR-SDH gene). 



important crop plants, such as com, thereby decreasing 
its nutritional quality [9]. 

Expression of feedback insensitive bacterial DHPS 
has been shown to result in elevated levels of free lys- 
ine in canola, soybean, and maize seeds [9, Falco etal, 
unpublished results]. In each case the increased level 
of free lysine is accompanied by accumulation of the 
lysine breakdown products saccharopine or a-amino 
adipic acid. Lysine-ketoglutarate reductase (LKR, EC 

1.5.1.8) and saccharopine dehydrogenase (SDH, EC 

1.5.1.9) catalyze the first and second step, respect- 
ively, in the breakdown pathway of lysine that produces 
these intermediates in seeds (Figure 1) [1]. LKR con- 
denses lysine and a-ketoglutarate into saccharopine 
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and SDH converts saccharopine to a-amino adipic- 
J-semialdehyde. Biochemical and genetic evidence 
derived from human and bovine studies demonstrate 
that mammalian LKR and SDH enzyme activities are 
present on a single protein with a monomer molecu- 
lar mass of 115 kDa [25]. Recent results obtained by 
Goncalves-Butruille et al. suggest that both enzyme 
activities from maize also reside on a single protein 
[15]. This contrasts with the fiingal enzyme activities 
which are carried on separate proteins, SDH (lysine- 
forming) with a molecular mass of about 44 kDa and 
SDH (glutamate-fonning) with a molecular weight of 
about 5 1 kDa [10, 12, 28, 34]. In fungi these enzymes 
catalyze the final two steps in the lysine biosynthetic 
pathway rather than a lysine catabolic pathway. Sever- 
al genes for the fungal SDH's have been isolated and 
sequenced, but no plant or animal genes have yet been 
reported. There is little information on the regulation 
of lysine catabolism in plants. Evidence from studies 
on tobacco and maize suggest that LKR expression is 
developmentally regulated and in tobacco seeds LKR 
activity is stimulated through an intracellular signaling 
cascade involving calcium and protein phosphoryla- 
tion, but the exact control mechanisms remain to be 
determined [1, 18, 19]. Nothing is known about the 
intracellular location of the lysine breakdown path- 
way. Lysine biosynthesis appears to be confined to the 
chloroplast [5, 26, 27]. 

In order to achieve a better understanding of the 
physiological role of lysine catabolism in higherplants, 
we have isolated and characterized the gene encoding 
LKR and SDH from the model plantArabidopsis thali- 



Materials and methods 

Strains 

The E. coli strains used were LE392, DH5a and 
BL219(DE3)pLysS [Novagen], the Arabidopsis thali- 
arta ecotype Landsberg erecta and Columbia. 

Gene isolation 

Primers were designed from Arabidopsis expressed 
sequence tags (ESTs) T 1361 8, and T45802. These 
primers were used to amplify a 2.24 kb fragment by 
PGR fyom genomic Arabidopsis DNA. The fragment 
was labeled with digoxigenin (DIG) using Boehringer 
Mannheims Dig-High Prime kit and protocol. This 



Lysine + a-KmoKluUratc 




a-Aminuadlpatc- A-ietnialdehydc 



Figure /. The first two steps of lysine catabolism in mammals 
and plants. LKR, lysine-keto^taiate reductase; SDH, saccharopine 
dehydrogenase. 



probe was used to screen a CD4-8 Landsberg erecta 
genomic library by plaque hybridization. About 2.7 x 
10' recombinant phage were plated on the host£. coli 
LE392, grown ovemigjit at 37 °C and screened. The 
hybridization temperature was 55 °C, everything else 
was done as described in the DIG Wash and Block Set 
protocol (Boehringer, Mannheim). Five positive clones 
were isolated of which four showed similar restriction 
patterns. One of them was subcloned into plasmid vec- 
tor pBluescriptSK +/— (Stratagene), transformed into 
DHSa competent cells (Gibco-BRL) and sequenced. 

DNA sequencing and data analysis 

DNA sequence analysis was carried out on both strands 
on an automatic sequencer (Model 377 and 373A) 
using the Ready Reaction FS Terminator sequen- 
cing kit and a 9600 Theraio-cycler (ABI, Apphed 
Biosystems). Sequence data were analyzed using the 
Lasergene system (DNAstar, V^sconsin). 

Isolation of total RNA 

Whole Arabidopsis plants wore frozen in liquid nitro- 
gen and crushed in a mortar containing 4 ml of 
1 M Tris-HCl pH 9.0 and 1% SDS. The extract was 
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transferred to a Sarstedt tube and 4 ml of a phen- 
ol/chlorofonn/isoamyl alcohol mixture (24:24: 1 v/v/v) 
were added. The solution was vortexed and centrifiiged 
at 12 000 X g for 10 min at room temperature. The 
supernatant was transferred to a new tube and 0.4 ml 
of a 2 M sodium acetate buffer, pH 5.2 and 8 ml of 
cold ethanol (70%) were added and the solution was 
kept on ice for 1 h. The nucleic acids were precipit- 
ated in a Sorvall centrifuge at 12 000 x </ at 4 °C for 
10 min and the supernatant discarded. The precipitate 
was dissolved in 2 ml of deionized sterile water and 
2 ml of a 4 M solution of lithium acetate were added. 
After storage on ice overnight the RNA was precipit- 
ated at 12000 X </ at 4 °C for 10 min, washed with 
2 ml of 70% ethanol, air-dried and dissolved in 0.4 ml 
of 10 mM Tris-HCl pH 7.5 in diethylpyrocarbonate 
(DEPC)-treated water. The RNA was stored at -70 °C 
until further analysis. 

RT-PCR 

RT-PCR was performed using a Perkin-Elmer kit. Total 
RNA ( 1 /ig) from Arabidopsis was reverse-transcribed 
using oligo-dT as a primer. The LKR and SDH gene 
specific products were isolated using oligonucleotide 
primers, which were designed based on homologies 
of the genomic LKR-SDH DNA from Arabidopsis 
with the known coding sequences of the correspond- 
ing fungal proteins and ESTs T13618, T45802, and 
T04246. Overlapping clones were generated, sub- 
cloned into the pGEM-T (Promega), transformed into 
DH5a competent cells and sequenced. 

Rapid amplification of cDNA ends (Race) 

Isolation of the 5' and 3' cDNA ends was per- 
formed using the 5' and 3' Race systems for rap- 
id amplification of cDNA ends (Gibco-BRL) accord- 
ing to the suppliers instructions. The reaction was 
started with 1 /ug of total RNA. For the 5' race 
the first and second gene specific primers were 
5'-CAGCAGCCAATGAGGAAT-3' and 5'-GCTGT- 
CCAAGTCCGTGTAAGAAGTCAACA-3', which 
are complementary to nucleotides 1262-1279 and 
1093-1121 in Figure 3, respectively. Multiple 
bands were obtained after the first amplifica- 
tion. The largest band, which was 650 bp 
in length, was isolated and cloned into the 
pGEM-T vector (Promega). As the gene specific 
primer for the 3' Race 5'-TCCTTGAAAGCAAAC- 
GTATAGAGAAGCACACT-3' was used, which is 



identical to 5559-5590 in Figure 3. The 5' and 3' Race 
products were sequenced as described above. 

Southern blotting 

Total DNA was isolated from whole Arabidopsis plants 
and 10 fi% were digested to completion with 5^/1 or 
Nsil. The digests were loaded on 0.7% agarose gels, 
blotted onto Hybond N membrane (Amersham) and 
hybridized to a DIG-labeled probe corresponding to 
the 1.7 kb long LKR cDNA fragment. Hybridization, 
washing and detection procedures were as described in 
the DIG Wash and Block Set protocol from Boehringer 
Mannheim. The hybridization and washing temperat- 
ures chosen were 55 °C and 65 °C respectively. 

Expression of Arabidopsis LKR-SDH in E. coli cells 

The 3.2 kb long ORF coding for LKR-SDH 
was isolated by reverse transcription and sub- 
sequent PCR amplification using the oligonuc- 
leotides ATGAATTCAAATGGCCATGAGGAG and 
TCATTCTGCCTTCTCCATCAG, which are comple- 
mentary to the 5' and 3' ends, respectively. The res- 
ulting PCR product was purified using the Promega 
PCR product purification kit and subjected to further 
amplifications using the oligonucleotides listed below: 
1: TGAA CCATGGC TTCAAATGGCCATGAGGAG 
2: CATA CCATGG CGAAAAAATCAGGTGTTT 
3: TAT GGTACC TCATTCAGGCTTCTCTTTTATCTC 
4: TCTA GGTACC TCATTCTGCCTTCTCCATCAG 
The complete LKR-SDH coding sequence was 
amplified using primers 1 and 4, which encompass 
the region between the 5' and 3' LKR-SDH cod- 
ing sequence, respectively. The LKR and SDH cod- 
ing sequencing were amplified separately using either 
primers 1 and 3 (LKR) or 2 and 4 (SDH), where 
primers 2 and 3 extend over nucleotides 3486-3503 
and 3461-3481 (Figure 3), respectively. The primers 
added unique Afcol (primers 1 and 2) and Kpnl (primers 
3 and 4) restriction sites (underlined) at the start codon 
and just past the stop codon of the gene, respect- 
ively. The generation of the Ncol sites resulted in the 
LKR region in a change of the second codon from 
asparagine to alanine and in the SDH cDNA in a 
change of the second codon from threonine to alan- 
ine. The PCR products were cloned into the Ncol 
and Kpnl restriction sites of the expression vector 
pBT430, a derivative of pET-3a [29] and transformed 
into BL21 (DE3)lysS cells (Novagene). Simultan- 
eously cells were also transformed with the vector only. 
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The transformed cells were plated on LB medium con- 
taining ampicillin (100 /xg/ml) and chloiamphenicol 
(34 /xg/ml) and grown overnight. Protein extracts from 
colonies resistant to antibiotics were subject to SDS- 
PAGE and analyzed for the relevant enzyme activity. 

Preparation of extracts for enzyme activities and 
protein 

Bacterial cultures (50 ml) were grown in LB media to 
an Afioo of 0.6, IPTG was added to a final concentration 
of 1 mM and cells were grown for an additional 3 h. 
Then the cells were centrifuged (5000 x g for 10 min at 
0-4 °C) and washed twice either with 100 mM phos- 
phate buffer pH 7.0 (LKR) or with 100 mM Tris-HCl 
pH 8.5 (SDH) and resuspended in 2 ml of the rel- 
evant extraction buffer (see below). Extracts for the 
determination of LKR and SDH activities were pre- 
pared as described previously [15,19] with some minor 
modifications. The LKR extraction buffer contained 
100 mM phosphate buffer pH 7.0, 1 mMEDTA, 1 mM 
DTT and 15% glycerol. The SDH extraction buffer was 
composed of 100 mM Tris-HCl pH 8.5, 1 mM DTT, 
1 mM EDTA, and 15% glycerol. The ceil suspensions 
were frozen (—20 °C), thawed and sonicated at 0 °C 
for 1 min (30 s-1 min-30 s). The broken cells were 
centrifuged for 20 min at 10 000 x g at 0-4 °C and the 
resulting supernatant and pellet were subject to SDS- 
PAGE and analyzed for enzyme activities as described 
below. Protein concentration was determined accord- 
ing to Bradford [2], using BSA as a standard. 

LKR specie activity 

LKR specific activity was determined essentially as 
described [19], except for some minor modifications. 
The reaction mixture contained in 0.5 ml, 100 mM 
phosphate buffer pH 7.0, 20 mM lysine, 10 mM q- 
ketoglutarate, 0. 1 mM NADPH, and 5-1 00 /ig protein. 
Conversion of NADPH to NADP was followed at A340 
at 30 °C. Aliquots were analyzed in the absence of 
lysine. 

SDH specific activity 

SDH specific activity was determined as described by 
Goncalves-Butrille et al. [15] with some minor modi- 
fications. The reaction buffer contained in 0.5 ml total 
volume, 20 mM saccharopine, 50 mM Tris-HCl pH 8.5, 
20 mM NAD and 5 ^tg protein extract. Conversion of 
NAD to NADH was followed at A340 at 30 °C at the 



linear range. Control experiments were performed in 
the absence of saccharopine and activity was calculated 
by subtracting the values of the control assay from the 
values in the assay containing saccharopine. 



Results 

Gene isolation 

The amino acid sequence for the fungal biosynthet- 
ic SDH proteins were used to search plant cDNA 
databases using the TBLASTN algorithm. We foimd 
two previously unidentified Arabidopsis ESTs (Gen- 
Bank/EMBL accession numbers T13618 and T45802) 
that are homologous to the Saccharomyces cerevisiae 
LYS9 gene. These ESTs were used to design primers 
and a 2.24 kb genomic fragment was amplified by PCR 
from genomic Arabidopsis DNA. The sequence simil- 
arity between the fungal glutamate-forming SDH and 
the isolated Arabidopsis fragment suggested that the 
latter contained coding sequences for SDH. Using the 
2.24 kb fragment as a probe we screened a CD4-8 
Landsberg erecla genomic library by plaque hybridiz- 
ation (see Materials and methods). One of the positive 
clones contained a nucleic acid fragment with regions 
that encoded a protein with domains homologous to 
fungal LKR (SDH-lysine-forming) and fiingal SDH 
(SDH-glutamate-forming). During the sequencing of 
this DNA Augment another match with an Arabidop- 
sis EST (GenBank/EMBL accession number T04246) 
was found in the 5' LKR encoding region. 

LKR-SDH cDNA isolation 

Alignment between the fimgal SDH proteins, the Ara- 
bidopsis ESTs and the genomic DNA fragment isolated 
from Arabidopsis allowed an approximate designation 
of the LKR and SDH coding sequences. Primers were 
designed and overlapping fragments of the correspond- 
ing cDNA were isolated from total Arabidopsis RNA 
by RT-PCR (Materials and metiiods). The sequences 
of the genomic DNA and cDNA fragments are shown 
in Figure 2. 

Sequence analysis of the complete LKR-SDH 
cDNA revealed an ORF of 3.16 kb, which predicts a 
protein of 117 kDa. The deduced amino acid sequence 
from the cDNA indicates that LKR and SDH domains 
reside on one polypeptide in Arabidopsis. The obser- 
vation that these two domains are linked, as has been 
reported for the purified com LKR-SDH protein [1 5], 
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Figure?. Continued. 

provided additional confidence that this clone carried Gene structure 
the Arabidopsis LKR-SDH gene. 

A diagram of the Arabidopsis LKR-SDH gene structure 
is shown in Figure 3. The alignment of the genomic and 
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cDNA sequences shows that the LKR-SDH apoprotein 
coding sequence is interrupted by 23 introns (Figures 
2 and 3). The intron lengths range from 78 to 203 
nucleotides with a majority lying in the range of 80-99 
nucleotides; this agrees well with previous analyses of 
Arabidopsis introns. The intron sequences also fit very 
well with the plant splice junction consensus sequence 
(Table 1). Only one of the introns has a :GC at its 5' 
end instead of a :GT; this divergence has been reported 
to occur at a low frequency m Arabidopsis. All introns 




have a > 58% AT content and all exceed the minimum 
required length of 66 nucleotides to ensure efficient 
splicing [4]. 

5'- and 3' -flanking regions 

The 5'- and 3'-flanking regions of the Arabidopsis 
LKR-SDH gene were isolated using 5' and 3' Race 
systems (Materials and methods). The adapter primer 
and the gene specific primer chosen for amplification 
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Figure 5. Restriction map and exon-intron pattern of the lhaliana LKRSDH geae. 
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always bridged a region containing an intron in the gen- 
omic sequence to ensure that no contaminating DNA 
was amplified. 

Amplification of the 5' Race product led to the 
production of several bands, probably due to incom- 
plete reverse transcriptase reactions. The largest band 
was gel isolated, analyzed with appropriate restriction 
enzymes, cloned and sequenced. Transcription most 
likely starts at a CTA sequence (Figure 2); upstream 
from the putative transcription start point are TATAAA 
and CAAT sequences. The TATAAA sequence begins 
at -33 and the CAAT sequence begins at -63. The 
position of translation initiation, the ATG codon (pos- 
ition 1 of the deduced amino acid sequence), indicates 
that the mRNA contains a 108 base long 5' leader 
sequence, which is interrupted by an 3S2 base long 
intron (Figure 2). 

Multiple ATGs that are out of frame with the 
LKR-SDH coding sequence, were identified in the 5'- 
untranslated region (positions 74, 451 and 461) (Fig- 
ure 2). The ability of eukaryotic ribosomes to initiate 
translation requires the AUG to reside in the consensus 
sequence ANNAUGN or NNNA UGG [22, 23]. The 
AUG's in the S' untranslated leader of the Arabidop- 
sis mRNA are not flanked by these consensus initiator 
sequences. The functions, if any, of these upstream 
AUGs is unknown. 

The S' -upstream region was analyzed for other con- 
sensus sequences. The Opaque-2 gene product trans- 
activates expression of ttie 22 kDa a-zein genes in 



maize endosperm and evidence exists which suggests 
that LKR could also be under Opaque-2 control [3]. 
Therefore, we analyzed the 5' leader for the con- 
sensus sequence of the Opaque 2-binding site GAT- 
GAPyPuTGPu [24]. No match between this consensus 
sequence and the 5 '-flan king region was found. Appar- 
ently the lysine degradative padiway operates in the 
seeds of various higher plants and might be confined 
to them, hence we looked for homology with the Sph 
box CATGCATG, a ciy-regulatory element conferring 
seed-specific expression [11, 30]. No match was found 
with this sequence either. Sequence analysis for other 
binding sites of known plant transcription factors [20] 
did not show any perfectly conserved binding sites. 

The 3' terminus of the cDNA sequence was amp- 
lified using the 3' Race system, resulting in the form- 
ation of only one product, which subsequently was 
cloned and sequenced. The 3' -untranslated sequence 
extends 90 bases past the stop codon. A poly(A)+ addi- 
tion signal resembling the animal consensus sequence 
AATAAA is not seen at the 3' terminus of the cDNA 
sequence (Figure 2). In the case of the LKR-SDH gene 
it might be that the poly(A) addition signal is differ- 
ent from the consensus. Although some plant genes 
do have the unaltered AATAAA motif, plants seem to 
be more divergent in this motif and other sequences 
up and downstream of the poly(A) cleavage site might 
compensate for lack of an AATAAA sequence [16]. 
Another possibility would be that reverse transcription 
during the 3' Race started from an intemal run of A 




Figure 4. Comparison of the deduced amino acid sequences of three fiingal genes encoding SDH (lysine-foiming) with Ha A. Ihaliana LKR. 
The alignment was created by the program PileUp (GCG Package). Amino acid residues that are identical between at least one of the iimgal 
proteins and the plant protein appear in the consensus. 



residues in the mKNA and not from the true poly(A) 
tail. Although a consensus poly(A) signal is present 
261 bases past the stop codon, it does not seem likely 
to be the true polyadenylation signal, since during the 
3' Race only one band was generated and subsequent 
amplifications using nested gene specific primers did 
not lead to the appearance of additional bands (not 
shown). The same was true when the amount of the 
oligonucleotide primer used for reverse transcription 
was decreased (not shown). 

Southern blot analysis 

Southern blot analysis was used to determine the num- 
ber of genes encoding LKR-SDH in A. thaliana (not 
shown). Total DNA was digested with Sstl and Nsil 
and hybridized with a digitonin-Iabeled cDNA probe 
corresponding to the first 1 .7 kb of the LKR-coding 
sequence (see Materials and methods). The expected 
band lengths can be deduced from the diagram in Fig- 



ure 3. Digestion with Sstl yielded a 4.7 kb and a 4.2 kb 
band. The latter band originates from digestion of a 
third Sstl site (not shown), which is present 4.2 kb 
upstream to the first Sstl site in the LKR gene. Diges- 
tion with Nsil yielded two bands of 3.3 and 1 .6 kb as 
expected. Under the conditions applied it thus appears 
that the LKR-SDH gene is present as a single copy in 
A. thaliana. 

Deduced amino acid sequence and sequence 
comparisons 

Pairwise comparisons between the deduced amino acid 
sequence of the Arabidopsis LKR-SDH and the four 
fimgal SDH protein sequences were made with the 
computer program Bestfit (GCG package, Wisconsin). 
The deduced LKR protein from Arabidopsis shows 
an identity of about 25% with the fungal proteins. The 
homology increases to about 50% with the inclusion of 
conservative substitutions (data not shown). Alignment 



of the deduced Arabidopsis SDH protein sequence 
shows a sequence identity of 37% and similarity of 
57% to the glutamate-forming SDH from S. cerevisi- 
ae. Optimal alignments between the Arabidopsis LKR 
and the three fungal lysine-forming SDHs were made 
with the program PileUp (GCG package), (Figure 4) 
and between the Arabidopsis SDH and the yeast SDH 
with Bestfit (Figure 5). The alignments of the LKR and 
SDH homologues reveal several stretches of conserved 
residues that may be important for the function of this 
enzyme (Figures 4 and 5). 

Lack of a chloroplast transit peptide sequence 

Enzymes involved in lysine biosynthesis have been loc- 
ated in the chloroplasts of plants [5] and many of the 
enzymes have been shown to be synthesized in the form 
of preproteins [13, 14, 17, 32]. The preproteins have 
amino-terminal extensions, chloroplast transit peptides 
(CTPs), which direct them from the cytoplasm into 
the chloroplast and which are subsequently removed 
from the protein upon entering the latter [2 1 ] . The /4 ra- 
bidopsis LKR-SDH gene studied in this work encodes 
a protein that appears to lack an N-terminal chloroplast 
targeting sequence, since it disagrees with at least three 
observations made by von Heijne et al. in a comparison 
of 26 CTPs [31]. The second amino acid is not an alan- 
ine, there are 4 charged groups in the first 10 residues 
(Glu-7, 8, 9 and Lys-10). Serine, which is present at a 
level of 20% in CTPs, is not enriched in the first 100 
residues of the LKR-SDH gene (Figure 2). Further- 
more, homology to the fungal SDH (lysine-forming) 
protein begins at amino acid 21 oi Arabidopsis LKR- 
SDH. Thus it appears unlikely that the Arabidopsis 
protein is targeted, implying that at least the first two 
steps of this lysine degiadative pathway occur in the 
plant cell cytosol. 

Expression of LKR-SDH in E. coli 

Fragments encoding either the LKR domain or the 
SDH domain or the complete bifiinctional protein were 
generated using PCR primers with appropriate restric- 
tion sites. The amplified fragments were digested, lig- 
ated to a prokaryotic expression vector and transformed 
into E. coli (see Materials and methods). The LKR 
domain alone or the complete protein coding sequence 
did not lead to the synthesis of detectable protein or 
enzyme activity (not shown). The failure to express 
these proteins was not due to mutations introduced by 
the amplification process. Efforts are now underway to 
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Fipm 5. Comparison of the deduced amino acid sequence of the 
S. cerevisicK SDH (ghitamatefoiming) and the A. Ihtdiana SDH. 
The alignment was created by the program BestFit (GCG Package). 
Identical residues are designated by bars, conserved substitutions by 



express the bifiinctional protein and the LKR domain 
in an eukaryotic expression system. In contrast, we 
were able to successfiilly express the SDH domain in 
E. coli leading to high protein levels and a high specif- 
ic activity (Figures 6 and 7). The SDH coding region 
encompasses 1 .4 kb on the cDNA clone, which predicts 
a protein of 52676 Da. Extracts from IPTG induced 
cells that were transformed with the vector carrying 
the 1 .4 kb insert were analyzed by SDS-PAGE and 
a protein at the expected size was overproduced in 
these cells (Figure 6). Separation of the cell extracts 
into its supernatant (lane Q and pellet (lane D) fi^iction 
shows that substantial amounts of protein are present in 
both of them. No band of similar intensity was present 
in uninduced cells that carry the vector -I- insert, or 
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Figure 6. SDS-PAGE of protein extracts. Cells weie grown and 
protein extracts prepared as described in Materials and methods and 
subjected to SDS-PAGE. Size markers (lane A). Extracts fixnn cells 
carrying vector + SDH insert (lanes B, C, D). Uninduced cells 
(lane B). IPTG-induced extract supernatant (lane C). IPTG-induced 
extract pellet (lane D). IPTG-induced empty vector Oane E). 
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Figure 7. SDH activity in bacterial extracts. SDH activity was 
assayed as described in Materials and methods. The reaction was 
started by the addition of S n% of protein of the bacterial extract. 
Assay of extracts from cells without (A) or with SDH cDNA insert 
(B-l-C), assayed in the presence (A-l-C) or absence (B) of sacchar- 



induced cells that cany an empty vector (lanes B and 
E, respectively). 

SDH activity -was measured in the soluble fiac- 
tion of the bacteria] extracts (Figure 7). As expected 
no SDH activity was observed in extracts from cells 
transformed with an empty vector (column A). Extracts 
from cells containing the SDH cDNA insert converted 
substantial amounts of NAD+ to NADH (column C). 
The reaction was specific for SDH in that no signific- 
ant activity was observed in the absence of the SDH 
substrate sacchaiopine (column B). 

Similar to the maize and mammalian enzyme, activ- 
ity of the Arabidopsis SDH increases fifom pH 6.0 to 



9.0 and retains at these pH values 10% activity when 
NAD+ is replaced by NADP-l- (data not shown). 



Discussion 

As a first approach towards understanding the 
physiological role of the lysine breakdown pathway 
in plants, we have isolated and characterized a gene 
encoding LKR-SDH fix>m A. thaliana. The gene 
encoding the LKR-SDH protein covers about 6.2 kb 
of the Arabidopsis genome. Sequence analysis of the 
cDNA revealed an ORF of 3.16 kb, which predicts 
a protein of 117 kDa. The alignment of the genomic 
and cDNA sequences shows that the LKR-SDH gene 
is interrupted by 24 introns. They are of small size, 
as expected for Arabidopsis introns, and are predom- 
inantly of the pyrimidine-rich class. Upstream fiiom 
the putative transcription start point, TATAAA and 
CAAT sequences were found at positions consistent 
with those of functional TATA and CAAT boxes repor- 
ted previously for other eukaryotic genes [7] . Although 
existing data implicates the Opaque 2 transactivator 
being involved in the regulation of expression of LKR 
in maize endosperm [3], a search in the 5'-flanking 
region of the Arabidopsis gene did not reveal an opaque 
2 regulatory element. It is possible that the sequence of 
the Opaque 2 binding site diverges in the present case 
from the known consensus. Alternatively, regulation 
of LKR-SDH in different plants may vary or Opaque 
2 may affect LKR-SDH indirectly, for example by the 
induction of the synthesis of an intermediary regulatory 
molecule. LKR activity appears to be restricted to the 
seeds of plants [18, Falco et al., unpublished results]. 
Hence, we analyzed the 5' flanking region for a Sph 
box, a sequence element, which has been shown to be 
involved in the seed specific expression of several plant 
genes. No sequence resembling the Sph box was detec- 
ted. A fiinctional analysis of the 5 '-transcribed region 
will be needed to further elucidate the regulation of 
expression of LKR-SDH in Arabidopsis thaliana. 

The deduced LKR and SDH amino acid sequences 
from Arabidopsis show an identity of about 25% and 
37% and a similarity of 50% and 57%, respectively, 
to the corresponding fungal proteins. Although LKR 
and SDH reside on one polypeptide in Arabidopsis, 
we were able to functionally express SDH separately 
from the LKR domain in bacteria. This activity was 
similar in its biochemical characteristics to those of 
the corresponding enzyme purified from maize [15]. 
We have so far been unable to express either the LKR 
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domain or the entire LKR-SDH protein in E. coli. In 
the amino acid sequence of the Candida albicans SDH 
(lysine- forming) protein, residues 194-224 have been 
indicated as being important in NADH binding [12]. 
Nine residues in this stretch of amino acids match a 
fingerprint determined by Wierega et al. [33], the most 
important being three glycines at positions 6, 8, and 
1 1 and an acidic amino acid at the last position of the 
peptide. In the case of a NADPH-binding site the latter 
would be expected to be exchanged for a hydrophobic 
residue. An 'ADP-binding fold' or 'fingerprint' was 
not found in either the LKR or in the SDH domain 
oi Arabidopsis as such. In some cases, however, vari- 
ations in this fingerprint have been reported [12, 33]. 

There are about 200 amino acid residues in theAra- 
bidopsis LKR-SDH protein between the regions homo- 
logous to fungal SDH (lysine-forming) and fungal 
SDH (glutamate-forming), which is suggestive of an 
intermediaiy or 'spacer' region. However, to define the 
role of this region, a functional analysis of ttie LKR 
domain and isolation of other bifiinctional LKR-SDH 
genes are necessary. 

In contrast to the lysine biosynthetic pathway, 
which appears to operate in the plastids of plant cells, 
our results suggest that the Arabidopsis protein is not 
targeted to the chloroplast, implying that at least the 
first two steps of this lysine degradative pathway occur 
in the cytosol of plants. A gene encoding a chloroplast- 
targeted isoform of the protein does not seem to exist, 
since standard Southern blot analysis using Arabidop- 
sis LKR cDNA as a hybridization probe, suggested that 
there is a single copy of the bifUnctional LKR-SDH 
protein. 

The results presented here have practical implica- 
tions. It has been shown that LKR-SDH participates 
in one of the major lysine breakdown pathways. Func- 
tion of this pathway interferes with the efficiency of 
lysine accumulation in seeds of transgenic crop plants, 
which were engineered to synthesize high levels of lys- 
ine ([9], Faico et al. unpublished results). Inactivation 
of LKR-SDH through genetic engineering therefore 
might be a feasible way to increase lysine accumula- 
tion on the one hand and avoid formation of undesired 
lysine breakdown products on the other hand. As a first 
step to accomphsh this, we have used the Arabidopsis 
LKR-SDH gene to obtain the corresponding genes from 
soybean and com. Furthermore, LKR has been shown 
to be also an important enzyme in mammalian cells. 
The human genetic disease familial hyperlysinemia is 
caused by the accumulation of lysine in mitochondria, 
which is caused by a defect in production of the LKR- 



SDH enzyme and hence results in a decrease or absence 
of lysine catabolism [25]. This study should simplify 
the isolation of the genes for these catabolic enzymes 
from animal sources, as it has for other plants. 
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Evidence Appendix B 

Rule 132 Declaration of Dr. Carl Faico dated August 24, 2000. (Note: The 
original declaration can be found in the file of Application No. 08/823,771.) 



A copy of this declaration accompanies the Response After Final submitted on 
February 4, 2008 and was entered by the Examiner, Office Communication dated 
March 5, 2008. 



EVIDENCE APPENDIX B 



PATENT 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 
IN THE APPLICATION OF: 

SAVERIO C. FALCO CASE NO.: BB- 1 037-D 

SHARON J. KEELER 
JANET A. RICE 

APPLN.NO.: 08/823,771 GROUP ART UNIT: 1638 

FILED: MARCH 24. 1997 EXAMINER: E.MCELWAIN 

FOR: CHIMERIC GENES AND METHODS 
FOR INCREASING THE LYSINE AND 
THREONINE CONTENT OF THE 

SEEDS OF PLANTS Date: AUGUST 24, 2000 



Assistant Commissioner for Patents 
Washington, DC 20231 

Sir: 

Dechration of Dr. Carl F«lco Punumt to 37 CFR 81.132 



I, Saverio Carl Falco, am a citizen of the United States of America, residing at 
1902 Miller Road, Arden, Delaware 19810, United States of America, and I declare as 
follows: 

1. I am one of the above-identified inventors named in this application. lam 
a graduate of Rutgers University of New Brunswick, New Jersey with a B.A. degree 
granted in 1971 with high honors and distinction in physics. I received a Ph.D. in 
1977 from the University of Chicago in biochemistry and molecular biology. From 
1977 to 1981 1 was aNational Institutes of Health postdoctoral fellow at the 
Massachusetts Institute of Technology. I have been employed by E. I. du Pont de 
Nemours and Company since 1981 directing and conducting research in plant genetic 
engmeering. 

2. I have reviewed the Office Action dated April 25, 2000. I am aware ftiat 
this declaration is being submitted to address ihe concerns set finth on page 4 and S of 
the Office Action that "the specification does not disclose any plants that comprise the 
claimed two gene fragments that result in the claimed increase in lysine relative to a 
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plant that does not comprise said two gene fragments. In addition, the specification 
fails to provide guidance \yith regard to the choice of subfragments that will result in 
the antisense inhibition or cosuppression of LKR." 

3. At the outset, it is noted that many components of the process of plant 
genetic engineering, e.g. construction of chimeric genes for expression in plant cells, 
or for blocking expression of endogenous genes, transformation of plants, have 
become routine for those skilled in the art Notwithstanding this, what follows is 
intended to show that one of ordinary skill in the art could follow the teachings of the 
instant application to practice the claimed invention without engaging in undue 
experimentation. 

4. First, the rationale for combining the nucleic acid fragments of the invention 
are clearly disclosed in the specification. It was shown, for the first time, that 
accumulation of excess free lysine in plant seeds, accomplished via expression of 
lysine insensitive DHDPS, is accompanied by breakdown of free lysine and 
accumulation of intennediates in the breakdown pathway such as saccharopine. Thus, 
there was a clear incentive to reduce the loss of excess lysine due to catabolism. 

5. Second, methods were provided to prevent lysine catabolism through 
reduction in the activity of the enzyme lysine ketoglutarate reductase (LKR), which 
catalyzes the first step in lysine breakdown. This can be accomplished by introducing 
a mutation in the plant gene that encodes LKR that reduces or eliminates en2^e 
fiinction. Such mutations can be identified by screening mutants for lysine over- 
producer lines that do not accumulate the lysine breakdown products, saccharopine 
and a-amino adipic acid. Alternatively, the first nucleic acid fragments containing 
plant LKR cDNAs were disclosed. The nucleotide sequences of these fragments 
make it straightforward to isolate LKR nucleic acid fragments from any plant desired 
(see point 6 below). Chimeric genes for expression of antisense LKR RNA or for 
cosiqjpression of LKR in the seeds of plants can then be created. The chimeric LKR 
gene can be linked to chimeric genes encoding lysine insensitive AK and DHDPS and 
all introduced into plants via transformation simultaneously, or the chimeric LKR 
gene or mutant LKR gene can be broi^t together with chimeric genes encoding 
lysine insensitive AK and DHDPS by crossing plants to create hybrids carrying two 
or more of the genes (see below). 

6. Third, examples of all of the nucleic acid fragments of the invention were 
provided in the specification of the subject case. In the case of the bifimctional 
protein lysine ketoglutarate reductase (LKR)/saccharopine dehydrogenase (SDH), two 
plant nucleic acid fragments (SEQ ID NOS:102 and 103) containmg cDNA derived 
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from the plant Arabidopsis thaliana were provided in the present patent application. 
In the application it was stated that full length cDNAs encoding plant LKR plus 
saccharopine dehydrogenase (SDH) or genomic DNAs containing the entire 
LKR/SDH gene can be readily identified by hybridization to labelled cDNA 
figments of SEQ ID NO:102: or SEQ ID NO:103: and thus isolated. This was, in 
fact, accomplished and is described in Epelbaum, S., McDevitt, R. and Falco, S. C, 
(1997) "Lysine-ketoglutarate reductase and saccharopine dehydrogenase from 
Arabidopsis thaliana: nucleotide sequence and characterization''. Plant Mol. Biol. 35, 
735. 

The availability of the Arabidopsis LKR/SDH gene made it straightforward for 
us, as it would be for anyone skilled in the art, to isolate other plant LKR/SDH genes. 
Degenerate oligonucleotides were designed based upon highly conserved regions of 
the deduced amino acid sequence of plant and fungal proteins and used to amplify 
soybean and com LKR/SDH cDNA fragments. Near full-length cDNAs for soybean 
and com LKR/SDH were then isolated iising 5' RACE and hybridization to cDNA 
libraries. LKR/SDH nucleic acid fitigmaits were isolated from several other plant 
species including wheat and rice by identifying EST sequences homologous to the 
already known plant LKR/SDH sequences. 

7. Fourdi, tUere is a -description of how to use these nucleic acid fragments to 
practice the invention. In the case of LKR/SDH, the availability of plant LKR/SDH 
genes made it possible to block eiqiression of the LKR/SDH gene in transformed 
plants via antisense inhibition or cosuppression. It was stated in the Office Action on 
page 4 that antisense inhibition and cosuppression of a gene in a plant is 
unpredictable. This is true only in the sense that every transformant does not produce 
the desired phenotype. But one skilled in the art is well aware of this and designs the 
experiment in a way that many transformants are obtained and screened for the 
desired phenotype. 

My own experience with cosiq)pression methodology in plants, as well as my 
knowledge of the work of my colleagues, and research work in the broader scientific 
community, indicates that this method is reliable and predictable. The use of 
cosuppression to blo^e^ression of several different genes in several different plants 
has been achieved ^m^succ^sfiilly at DuPont. 

Specifically in the case of LKR/SDH, cosuppression has been used to block 
expression with the first gene fragment and promoter combination tested, v^ich 
hardly represents undue experimentation (see point 10 below). 
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8. It is stated on page 5 of the Office Action that "De Luca teaches that 
modifying metabolic pathways by transfbnning plants with genes that control steps of 
the pathway is highly unpredictable and often the desirable results are impossible to 
achieve." This may be true in cases where not enough is known about the metabolic 
pathway, but in the case of the lysine biosynthetic and catabolic pathways, it has been 
demonstrated how to increase production of lysine via modification of the 
biosynthetic pathway using lysine insensitive DHDPS and AK, and shown that 
accumulation of free lysine in seeds is also controlled by catabolism of lysine. We 
teach that blocking the first step in lysine catabolism will lead to increased 
accumulation of lysine and this is, in &ct, what we have observed as described below. 

9. The com LKR/SDHcDNA sequence was used to identify transposon 
mutations in the endogenous com LKR/SDH gene via PGR screening of a library of 
com lines containing Robertson's Mutator transposon insertions. The precise location 
of Mutator insertions into the LKR/SDH gene was determined by sequencing of 
genomic DNA from individual mutants. An insertion mutation located in an exon in 
the LKR domain of the gene was chosen for further study. Southon blot analysis of 
com genomic DNA indicated that com contains only one LKR/SDH gene. Since an 
insertion mutation is expected to block function of the gene, it was anticipated that 
such a mutation would be recessive. One fourth of the progeny seed from a selfed 
com ear with such a mutation segregating would be expected to be homozygous for 
the mutation. It was observed that approximately one fourth of such seed exhibited a 
higher level of free lysine than normal (S to 1 S fold higher) without the increase in the 
lysine catabolite sacchaiopine that is seen when &ec lysine is increased via expression 
of lysine insensitive DHDPS. It was concluded that knockmg out LKR/SDH, by 
itself, was able to Increase seed lysine content in com seeds. 

The LKR/SDH Mutator insertion line was crossed by a transgenic line that 
accimiulates excess free lysine due to expression of lysine insensitive DHDPS and 
AK. In this cross two genetic loci that affect lysine accumulation, one of which is 
recessive (the LKR/SDH Mutator insertion) and one of which is semi-dominant (the 
lysine insensitive DHDPS and AK trangene locus), are segregating. Single seeds 
were analyzed for lysine and saccharopine content The most striking observation 
from this experiment is that the highest lysine containing seeds have low levels of 
saccharopine (see figure). The low saccharopine level indicates that these seeds are 
homozygous for the LKR/SDH Mutator insertion, v^le the high lysine level indicates 
that they carry the lysine insensitive DHDPS and AK trangene locus. The level of 
lysine accumulation is considerably higher (2-3 fold) tiian the level provided by the 
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DHDPS and AK trangene locus alone. Thus, tbis experiment demonstrates that an 
increase in the accumulation of lysine, accompanied by a reduction in accumulation 
of lysine catabolites can be accomplished by combination of lysine overproduction 
brought about by expression of lysine insensitive DHDPS + AK and reduction of 
lysine catabolism by blocking expression of LKR/SDH, as we taught in the patent 
application. These results show that the concem stated in the Office Action on page 5 
that "modifying metabolic pathways ... is highly unpredictable and often the desirable 
results are impossible to achieve" is unfounded in this particular case. 

10. As indicated above, LKR/SDH expression has been blocked in com via 
cosuppression. To accomplish this a chimeric gene designed for cosuppression of 
LKR was constructed by linking a 1268 bp LKR/SDH gene fiagment, y/bkh included 
the LKR coding domain, to the com oidosperm 27 kD zein promoter and 10 kD zein 
3' untranslated region. This diimeric gene was introduced into com by particle-gun 
mediated transformation. Of 72 transformation events that were regenerated into 
plants and produced seed, 1 3 had seeds with a greater than four fold increase in free 
lysine. This is a typical frequracy for cosuppression events. Since the transformed 
plants were out-crossed, the transgenic locus must be dominant or there would not 
have been any observable phenotype. This is expected fmm a cosuppression 
transgene, and is an advantage over knock-out mutations like the LKR/SDH Mutator 
insertion described above. 

Some of the LKR.cosupprBssion transformants have been carried forward for 
fiirther testing. An event that has continued to show the increased firee lysine 
phenotype for several generations and behaves genetically as a single locus transgene 
insertion has been selected for crossing to the transgenic line that accumulates excess 
free lysine due to expression of lysine insensitive DHDPS and AK. Results from that 
e^qMsriment are not yet available, but the expectation is that seeds carrying both 
transgene loci will have- higher lysine levels than either parent, as was observed in the 
LKR Mutator insertion cross described above. In addition, co-transformation 
experiments in vAdch the chimeric gene designed for cosuppression of LKR described 
above has been combined with a chimeric gene for expression of lysine insensitive 
DHDPS and introduced into com by particle-gun mediated transformation are in 
progress. This is expected to yield transformants that produce seeds with the high 
lysine level observed in the LKR Mutator insertion cross by lysine insensitive 
DHDPS and AK, but with both chimeric genes at a single genetic locus, vMch is 
highly desirable for com breeding. 
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In summary, all of the elements of the claimed invention were provided in the 
patent application. The teachings in this case are in the public domain, due to the 
issuance of U. S. Patent 5,773,691 of which the instant :q)plication claims priority as a 
divisional application.. One skilled in tiie art can take these elements, as discussed 
above, and practice Hoc invention without undue experimentation. 

I declare further that all statements made herein of my own knowledge are true 
and that all statements made on infoimation and belief are believed to be true, and 
further that these statements are made witii the knowledge that willful false statements 
and the like so made are punishable by fine or imprisonment, or both, under Section 
1001 of Title 18 of the United States Code and that such willful &lse statements may 
jeopardize the validity of the qjplication or any patent issuing thereon. 
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Evidence Appendix C 

Rule 132 Declaration Declaration of Dr. Carl Faico dated February 16, 2001 
(Note: the original declaration can be found in the filed of Application No. 
08/823,771.) 

A copy of this declaration accompanies the Response After Final submitted on 
February 4, 2008 and was entered by the Examiner, Office Communication dated 
March 5, 2008. 
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PATENT 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 
IN THE APPLICATION OF: 

SAVERIO C. FALCO CASE NO.: BB-1037-D 

SHARON J. KEELER 
JANET A. RICE 

APPLN.NO.: 08/823,771 GROUP ART UNIT: 1638 

FILED: MARCH 24, 1997 EXAMINER: E. MCELWAIN 

FOR: CHIMERIC GENES AND METHODS 
FOR INCREASING THE LYSINE AND 
THREONINE CONTENT OF THE 

SEEDS OF PLANTS Date: FEBRUARY 16, 2001 

Assistant Commissioner for Patents 
Washington, DC 20231 

Sir: 

Declaration of Dr. Carl Faico Pursuant to 37 CFR S1.132 

I, Saverio Carl Falco, am a citizen of the United States of America, residing at 
1902 Millers Road, Arden, Delaware 19810, United States of America, and I declare 
as follows: 

1. I am one of the above-identified inventors named in this application. lam 
a graduate of Rutgers University of New Brunswick, New Jersey with a B.A. degree 
granted in 1971 with high honors and distinction in physics. I received a Ph.D. in 
1977 from the University of Chicago in biochemistry and molecular biology. From 
1977 to 1981 1 was a National Institutes of Health postdoctoral fellow at the 
Massachusetts Institute of Technology. I have been employed by E. I. du Pont de 
Nemours and Company since 1981 directing and conducting research in plant genetic 
engineering. 

2. I have reviewed the Office Action dated November 22, 2000. I am aware 
that this declaration is being submitted to address the concerns set forth on page 3 of 
the Office Action that the "Declaration of Falco teaches use of a bifunctional 
LKR/SDH gene to identify mutants produced by transposon mutagenesis. This plant 
does not contain a foreign LKR gene. In addition, the Declaration of Falco teaches of 
a combination DHDPS gene without an AK gene. Thus, the Declaration of Falco 



does not teach a plant with a foreign LKR gene and a foreign DHDPS gene ... it 
remains unpredictable what the results would be of introducing just the LKR gene and 
the DHDPS gene into a plant." 

3. It was stated in paragraph 10 of my declaration previously submitted on 
August 24, 2000 that a co-transformation experiment in which a chimeric gene 
designed for co-suppression of LKR was combined with a chimeric gene for 
expression of lysine insensitive DHDPS was in progress. That experiment was 
expected to yield transformants that produced seeds with higher free lysine levels than 
transformants from a parallel experiment using the DHDPS gene alone. The results of 
those experiments have now been obtained and they do confirm the prediction that 
transformants comprising the chimeric gene designed for co-suppression of LKR and 
the chimeric gene for expression of lysine insensitive DHDPS produced seeds with 
higher free lysine levels than transformants from a parallel experiment using the 
DHDPS gene alone. These results are depicted in Figure 2 and Table 1 . 
4. The chimeric genes used for the experiments were: 

i) com globulin! promoter/com chloroplast transit sequence/ 
Corynebacterium dapA gene/com globulinl 3'UTR; and 

ii) com 27kd zein promoter/fragment of com LKR-SDH cDNA/com lOkd 
zein 3' UTR 

Seeds from many transformation events from each experiment were analyzed 
for free lysine content. It is clear from the data presented in Figure 2 that the best 
seeds obtained from the co-transformation experiment had considerably higher free 
lysine levels than the best seeds obtained from the transformation experiment where 
only the DHDPS gene was used. The average free lysine level from the 30 highest 
lysine seeds, or from the 70 highest lysine seeds, was about 2-fold higher for the co- 
transformation experiments compared the DHDPS only experiment. 

5. It also was stated in paragraph 10 of my previous declaration submitted on 
August 24, 2000 that an LKR co-suppression transformant which showed an 
increased seed free lysine phenotype for several generations, and behaved genetically 
as a single locus transgene insertion, was crossed to a transgenic line that accumulates 
excess free lysine due to expression of lysine insensitive DHDPS and AK. Results 
from that experiment, which were not available at the time of the previous 
declaration, have confirmed the expectations expressed there, namely that seeds 
carrying both transgene loci will have higher free lysine levels than either parent. The 
data are presented in Figure 1 . 

6. In this experiment described in paragnq}h 5 above, transgenic lines 
homozygous for an insertion of DHDPS and AK genes, or homozygous for the co- 
suppressing LKR/SDH gene, were each crossed to a wild type com line or to each 



other. The Fl progeny seed from these crosses are hemizygous for the DHDPS and 
AK transgenic insertion, the co-suppressing LICR/SDH transgenic insertion, or both. 
Each cross was repeated at least 5 times, and seeds from the resulting com ears were 
harvested and analyzed for free lysine levels. The results depicted in Figure 1 are 
averages derived from these repetitions. These results show the dramatic increase in 
free lysine resulting from the combination of increasing the synthesis of lysine via 
expression of the DHDPS gene and blocking the major pathway for lysine catabolism 
by co-suppressing the LBCR/SDH gene. 

7. Parenthetically, it is noted that a concem was raised in the Office Action 
dated November 22, 2000 that results from combining the DHDPS and AK transgenic 
insertions with a co-suppressing LKR/SDH transgenic insertion would not be 
predictive of combining a DHDPS only transgenic insertion with a co-suppressing 
LKR/SDH transgenic insertion. It is noted that there is evidence in the subject 
application that AK plays a secondary role to DHDPS for increasing the synthesis of 
lysine. 

For example, it was demonstrated for (i) rapeseedtransfonnantsonpage31 
at lines 18-24 of the specification that : 

"Transformants expressing DHDPS protein showed a greater than 100-fold 
increase in free lysine level in their seeds. There was a good correlation between 
transformants expressing higher levels of DHDPS protein and those having higher 
levels of free lysine. One transformant that expressed AKIII-M4 in the absence of 
Corynebacteria DHDPS showed a 5-fold increase in the level of free threonine in the 
seeds. Concomitant expression of both enzymes resulted in accumulation of high 
levels of free lysine, but not threonine." 

And for (ii) com transformants (page 33 at lines 15-24: 
"Free lysine levels in the seeds is increased from about 1.4% of free amino 
acids in control seeds to 15-27% in seeds of transformants expressing 
Corynebacterium DHDPS alone from the globulin 1 promoter. The increased free 
lysine was localized to the embryo in seeds expressing Corynebacterium DHDPS 
from the globulin 1 promoter. 

The large increases in free lysine result in significant increases in the total seed lysine 
content. Total lysine levels can be increased at least 130% in seeds expressing 

Corynebacterium DHDPS from the globulin 1 promoter Greater increases in free 

lysine levels can be achieved by expressing E. coli AKIII-M4 protein from the 
globulin 1 promoter in concert with Corynebacterium DHDPS." 



8. Thus, the gene encoding lysine insensitive AK can enhance the effect of the 
DHDPS gene on lysine synthesis by increasing overall flux through the biosynthetic 
pathway. However, AK does not increase lysine when expressed without DHDPS. It 
is the DHDPS gene that is necessary for increasing the synthesis of lysine. The 
presence of the AK gene along with the DHDPS gene in the cross described above is 
inconsequential with respect to proof of the concept that the combination of 
increasing lysine synthesis (which can be achieved using the DHDPS gene alone or in 
combination with the AK gene) and blocking lysine catabolism (which can be 
achieved by blocking expression of the LKR/SDH gene via co-suppression) works 
better than either alone. 

9. The genetic cross experiment and the co-transformation experiment 
described above, taken together with the detailed description of the invention 
provided in the patent application and the previous declaration, clearly demonstrate 
that an increased lysine content is achieved when a lysine insensitive DHDPS gene 
(with or without a lysine insensitive AK gene) is combined with a co-suppressing 
LKR gene. 

I declare further that all statements made herein of my own knowledge are true 
and that all statements made on information and belief are believed to be true, and 
further that these statements are made with the knowledge that willful false statements 
and the like so made are punishable by fine or imprisoiunent, or both, under Section 
1001 of Title 18 of the United States Code and that such willful false statements may 
jeopardize the validity of the application or any patent issuing thereon. 
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Regulation of Lysine Catabolism througii Lysine-Ketoglutarate 
Reductase and Saccharopine Dehydrogenase in Arabidopsis 

GuHiang Tang,^ Daphna Mlron.^ Judith X. Zhu-ShfmonI, and Gad Galili^ 

Department of Plant Genetics, Weizmann Institute of Science, Rehovot 76100, Israel 

In plant and mammalian cells, excess lysine Is catabolized by a pathway that Is initiated by two enzymes, namely, 
lyslne-ketoglutarate reductase and saccharopine dehydrogenase, in this study, we report the cloning of an Arabidopsis 
cDNA encoding a bifunctional polypeptide that contains both of these enzyme activities linked to each other. RNA gel 
blot analysis Identified two mRNA bands— a large mRNA containing both lyslne-ketoglutarate reductase and saccha- 
ropine dehydrogenase sequences and a smaller mRNA containing only the saccharopine dehydrogenase sequence. 
However, DMA gel blot hybridization using either the lyslne-ketoglutarate reductase or the saccharopine dehydrogenase 
cONA sequence as a probe suggested that the two mRNA populations apparently are encoded by the same gene. To test 
whether ttiese two mRNAs are functional, protein extracts firom Arabidopsis cells were fractionated by ank>n exchange 
chromatography. This fractlonatran revealed two separate peaks— one containing both coeluted lysine-ketoglutarate 
reductase and saccharopine defiydragenase activities and the second containing only saccharopine dehydrogenase 
activity. RNA gel blot analysis and In situ hybridization showed that the gene encoding lysine-ketoglutarate reductase and 
saccharopine dehydrogenase is significantly upregutatad in floral organs and in embryonic tissues of developing seeds. 
Our results suggest that lysine catabolism Is subject to complex developmental and physiological regulation, which 
may operate at gene expression as well as post-transiatlonal levels. 



INTRODUCTION 



in the cell, the level of the essential amino acid lysine Is sub- 
ject to tight regulatton in both mammals and plants. In both 
types of organisms, excess lysine is catabolized via saccha- 
ropine and o-amlnoadipic semialdehyde into a-aminoadipic 
acid and glutamate poller, 1976; Bryan, 1980; Markovitz et 
a!., 'l984; Galili et al., 1994; Galili, 1995; Goncalves-Butmille 
et al., 1 996). The first enzyme In the lysine catabolk: pathway 
Is lysine-ketoglutarate reductase (LKR), which condenses 
lysine and a-ketoglutarate into saccharopine and uses the 
cofactor NADPH (Figure 1, reaction 1). The second enzyme, 
saccharopine dehydrogenase (SDH), converts saccharopine 
into a-aminoadipic semialdehyde and glutamate (l^gure 1, 
reaction 2). This enzyme uses NAD+ or, much less efftderrtly, 
NADP+ as a cofactor (Markovitz et al., 1984; Goncalves- 
Butruilieetal.,19g6). 

The molecular and biochemical regulatton of lysine catab- 
olism is still not clearly understood. Feeding lysine to rats or 
appiying it to tobacco plants stimulated the activity of LKR 
in rat livers or in tobacco seeds, respectively (Foster et al., 
1993; KarchI et al., 1994). Stimulation of this enzyme has 
also boon observed in transgenic tobacco seeds overpro- 
ducing lysine because of expression of a feedback-lnsensl- 

' Both authors contributed equally to this work. 
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wlccmaH.welzmann.ac.il; fax 972-8-9344181 . 



five bacterial dihydrodlplcollnate synthase (Karchi et al., 1995). 
This suggests that in both mammalian and plant ceils, lysine 
may autoregulate its own catabolism In addition, recent stud- 
ies have shown that in tobacco seeds, the lysine-dependent 
stimulation of LKR activity is mediated by an irrtraceliular sig- 
naling cascade requiring Ca^'^ and protein phosphorylation 
(Karchi et aL, 1995). The control of LKR activity In plants may 
be even more complex. In devekiping maize seeds, LKR ac- 
tivity was found to be reduced by two- to threefoM in the high- 
lysins opaque2 mutant, as compared with wiM-type plants 
(Brochetto-Braga et al., 1992). Opaque 2 is a transcriptkxi 
factor that regulates the expression of seed storage proteins 
(ShoKvell and Larkins, 1988). This transcription factor could 
also complement the yeast GCN4 transcription factor that 
regulates the expression of many yeast genes encoding en- 
zymes involved in amino acid metabolism (Hinnebusch, 1988). 

Although LKR and SDH appear to control important pro- 
cesses, their structural aspects and cellular functions differ 
among various eukaryotic species. In yeast cells, in which 
lysine is synthesized via a-amlnoadipate (Bhattacharjee, 
1985), LKR and SDH play essential roles in lysine biosynthe- 
sis, and they appear as two separate polypeptkles (Ogawa 
and Fujioka, 1978). In mammalian cells, which cannot syn- 
thesize lysine, LKR (LYS1) and SDH (LYS9) play an essential 
role in the catabolism of excess cellular lysine (Dancis et al., 
1969), but their structural aspects may vary among species. 
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Figure 1. Lysine Catabolism via the Saccharopine Pathway. 
Reaction 1 is catalyzed by LKR, which condenses L-lysine and 
a-ketoglutarate into saccharopine. Reaction 2 is catalyzed by SDH. 
which hydrolyzes saccharopine Into a-aminoadlpic semlaldehyde 
and glutamic acid. 



In rat liver, LKR and SDH were shown to be distinct mono- 
functional enzymes (Noda and Ichihara, 1978); however, hu- 
man placenta (FJellstedt and Robinson, 1975) and bovine 
liver (Markovta et al., 1984) possess these two enzyme ac- 
tivities on a single bifunctional protein. In plants, which syn- 
thesize lysine via diaminopimelate, LKR and SDH also 
function in lysine catabolism ^nxida and da Silva, 1983), 
and recently, a bifunctional LKR/SDH enzyme has been pu- 
rified from developing maize seeds (Brochetto-Braga et al., 
1992). Moreover, In plants, LKR and SDH activities have 
been detected only in developing seeds to date (Amida and 
da Silva, 1983; Karchi et al.. 1994). 

To elucidate further the regulatory role of LKR and SDH in 
lysine catabolism, we have cloned and characterized two 
cDNAs encoding a bifunctional LKR/SDH and a monofunc- 
tional SDH from Arabldopsis. We also show that Arabldopsis 
cells contain an mRNA species encoding a bifunctional LKR/ 
SDH and another mRNA encoding a monofunctional SDH 
and that these are likely to be transcribed from a single 
gene. In addition, we have determined that expression of the 
Arabldopsis LKR/SDH gene Is subject to spatial and devel- 
opmental controls. 



RESULTS 



Identtfication and Characterization of an Arabldopsis 
CDMA Encoding a Monofunctional SDH 

From the Arabidopsis sequence datat}ases, we have identi- 
fied an expressed sequence tag (EST) clone showing signifi- 
cant homology with SDH from yeast (clone 23A3T7). The 
complete nucleotide sequence of this cDNA was deter- 
mined. As shown in Figure 2, this ~1.5-kb cDNA (desig- 
nated cAt-SDH) contains an open reading frame initiated by 
an ATG codon with a consensus (G/A) at position -3 (Joshi, 
1987) and encodes a putative protein of 482 amino acids. In 
addition, this cDNA contains a 5' noncoding sequence of 51 
nucleotides and a 3' noncoding sequence of 80 nucleotides 



that is ended by a poly(A) tail. As shown in Figure 3, the 
open reading frame of cAt-SDH has significant homology 
with yeast SDH (LYS9), sharing 36.1% identity and 56.4% 
similarity. The initiation ATG and the stop codons of both 
yeast and the putative Arabidopsis SDH also appeared at 




Hgtra Z Nudeotkto and Deduced Amino Acid Sequence of cAt-SDH. 
The ATG and TAG Inltiatton and stop codons of the open reading 
frame encoding the putative SDH protein are in boldface and under- 
lined. The asterisk Indicates the protein termination site. The Gen- 
Bank accesskm number Is U90S23. 



Regulation of Lysine CataboHsm 1307 



MrKKS(?rt.njGAGRVCRPAADFLASVRTISSQaWVmPQW)SEEKTDVH 50 
..MoiwiijXSGFVAQmOTuJi NDDIN 28 



VTVACRTWNA.QAIAKPSGSKAISLDVTDDSALDKVLAIMWVISLIPY 77 

SCHAWACTCIEIJ(KHLVTASYVDDOTShLheNWCSAGITIU3EW3LDPG ISO 
. I: . \ . . : . I I I I -• ! . I : I I I I I 

TFHPNWKSAIRTKTDWTSSYISPALRELEPEIVKAGrrVMUEIGLDPG 127 



IDHLYAVXTIDEVHRAOSKUCSFLSVCOGLPAFEDSCNPUrrKFSHSSRO 177 
AlIUlOQMPAKYXSIKSDIIKVtnm.'YTjSXMlFRVIVLPWALECVnnDE 250 
VLLAI«NSAKYWKDGKIETVSSBDUIA'r*KPVFI. . YWYAPVCYPNRDS 225 
LVYGGHYGIESEATTIIVcH'UtYBSFSHimmXLGFFDSEANavLSTG 300 
TLkDLYHI.PEAETviRGTLRYQGFPEPVKAfcVCWCW^ 272 
KRITFGAUiNILNKDADNESEPLKiEEEisKRIIKLQHSKETM 



KPIAWNEALKQYIjSAKSTSKEDLIASIDSXATI*. . 



:ll 



FAWLaLFSDAKrTP!<GNA.LDrU:ARLBE 
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mAHKFGIE 366 

FLBSXRIEKmATU.EFSDiKNQQTTTAMMCT\ni?AXiaMU.IEDKIK 450 
! II. .I.:|.||:::|.s .. ..II III. I. III. liiiLlI 
WACGT.TEIDTSTLVDySKV. . .QGYSSHAATVGYFVAMTKFVLDOTIR «12 
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Hie top line Indicates the amino acid sequence of cAt-SDH; the bot- 
tom line Is that of the yeast SDH (LYS9). kdenUcai amino acids ara In- 
dicated by bars; highly similar vnino acids are indicated by colons; 
and similar amino acids are Indwaled by a single dot. Ihe asterisks 
indicate the protein termination sites. 



very similar positions along the open reading frames. How- 
ever, the Arabidopsis SDH has a short amino acid sequence 
(Figure 3, positions 26 to 45) that is not present in the yeast 
SDH. The 5' noncoding region of cAt-SDH contains an addi- 
tional ATG consensus codon in a coding frame different 
from that of the putative SDH open reading frame and Is im- 
mediately followed by a stop codon. Whether this ATG has 
any functional role is still not known. 

To test whether cAt-SDH encodes an SDH enzyme, the 
entire coding sequence of this cDNA was subcloned into ei- 
ther the pUC18 or pET-15b bacterial expression vectors and 
used to transform Escherichia coli cells. As shown in Figure 
4, bacterial ceils harboring either of these plasmids contain- 
ing the cAt-SDH insert have significantly elevated levels of 
SDH activity, as compared with control bacteria harboring 
the expression plasmids with no inserts. 



Analysis of SDH mRNA Levels in Different Aiabidopsls 



leaves, stems, roots, flowers, and young seedlings, and the 
levels of SDH mRNA were analyzed by hybridization with the 
cAt-SDH DMA as a probe. As shown in Rgure 5, two major 
cross-hybridizing mRNA bands were detected. One had the 
expected size of ~1.5 i<b corresponding to cAt-SDH, and a 
second larger mRNA was ~3.5 l<b. Both mRNA bands were 
detected in all tissues after a long exposure time (data not 



Cloning and Characterization of an Arabidopsis cDNA 
Encoding a Putative Bifunctional LKR/SDH Polypeptide 

Previous studies have shown that plants, iii<e mammals, may 
have bifunctional LKR/SDH enzymes (Goncalves-Butaillle et 
al., 1996). Therefore, we hypothesized that the ~3.5-kb 
mRNA band, shown in Figure 5, encodes a bifunctional LKFV 
SDH In w*)ich the SDH region is highly homologous to the 
monofunctionai SDH mRNA. To examine this possibility, we 
screened an Arabidopsis cDNA library with the 1^5-kb SDH 
cDNA as a probe. Several positive clones were rescued, and 
the longest (~3.2-kb Insert, designated cAt-LK(VSDH) was 
sequenced. As shown in Figure 6. cAt-LKR/SDH contains a 
long open reading frame of 3195 nucleotides encoding a pro- 
tein of 117 kD. The size of the encoded protein Is similar to 
the size of the bifunctional LKR^DH recently purified from 




To test the expression of the Arabidopsis SDH gene In dif- 
ferent tissues, total RNA was extracted from cell cultures. 



Figure 4. cAt-SDH Encodes a Functional SDH Enzyme. 
The coding sequence of cAt-SDH was subcloned into two different 
bacterial expression vectors and transformed Into E. coli. Protein ex- 
tracts from bacteria hartxjring the plasmids containing cAt-SDH, as 
well as control bacteria transformed with the expression vector lack- 
ing cAt-SDH, were analyzed for SDH activity with (+) or without (-) 
the substrate saccharopine (sacch.). Each histogram represents an 
average of three separate activity tests ±SE. Letters above error 
bare represent significant differences at the 5% level, as determined 
by an ANOVA test, -insert, the empty vector not c< 
SDH; +lnsert, the vector with cAt-SDH. 
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Fiffure 5. RNA Get Blot Analysis of cAt-SOH mRNA. 
Twenty micrograms of tot^ RNA Uom c«U cuitur«$ (lane i^ young 
seedfo^gs flane 2), flofal organs (lane 3). teaves (lar>e 4), iitems (lane 
5), and roots (tow 6) was fractionated by gei eleclrophoresis and ei- 
ther hybrWied on an RNA cxiI bloi with cAt-SDH used as a pr<*>9 
(top) or stained v»ith sthidium bromkJe as a comtol (bottom). The mi- 
gration of the laS aivi 28S rHNAs is slv^wn at rigtit. The positions ol 
the monohjn«ion^ SOH ntfiNA (-1.5 Kb) and the bifurwtioftal LKfV 
SOH mnNA (<^'3.5 kb) are indicated at tefi. 



maize (125 kD; Goncalves-Butruille et al., 1996) arrt from 
soybean (123 kD; D. Miron, S. Ben-Yaacov, 0, Retries, and 
G. Galiti, manuscript In prep»ation). This open reading frame 
is flanked by a 5' noncoding sequence of 62 nucleotides ami 
a 3' noncoding sequaioe of 10 nucteotidBS. The cAt-LKfV 
SOH cONA also lacks a 3' polyfA) t»l, suggesting that its 3' 
region is noi cc»tiplete. IntarEetingly, the 3' 1S10 rajcleotides 
Of cAt-LKR/SDH are 100% homologous to nucleotides 1 to 
1510 erf cAt-SW encocflng the mono*uncti«iai SDH {trf, Rg- 
ure2), 

As shown in Figure 7, the N-termirK* pat of the putative 
protein encoded by cAt-LKFVSDH (460 atrtno ackis) exhi«>- 
its significant homology to the yeast monofiinction^ UKB, 
wiWi 24.9% kleniity and 52.1% similarity. The ATG Initiation 
and stop codons of the yeast and the putative Arabidopsts 
LKR proteins also appear at comparable places along the 
open reading frame (Figure 7). However, the Arabldopsis 
LKR also has severaJ sm^dl ammo ackl sequences that are 
not present in the yeast LKR (Figure 7). 

The 6' noncoding region of cAf-LKR/SDH contains three 
ATG triplets located seven to 41 nucteotides upstream of ttie 
(^•esumed ATG translation initiatton codon of the LKR/SDH 
open reading frame. These ATG codons form smaJl open 
reading frames of nine to 15 amino acids, and rjone of these 
ATG codons contaitK the (A/G) consenwJS at ptsitton -3, 
which is genwally found before eukayotte translation Initia- 
tion codons (JosM, 19&7). sugg^ting ttiat these ATG triplets 
may have limited if any function in translational initiation. 

Amino acid sequence alignment of the deduced poiypep- 
tkJe product of cAt-LKR/SDH with the yeast monofunctional 
LKR and SDH (Figures 2 and 6) stmvm that the putative cAt- 



LKR^DH-encoded protein contains an intomediate region 
(amino acids 462 to 582, shown in tx>ldface letters in Figure 
6) that is not i^esent in eittwr the yeast LKR or the SDH en- 
zymes. Atthwjgh the functional significance of this region is 
still not l^nown. Intennediate regions previously have bmn 
found in other biftjncttonal polypeptides, such as the aspar- 
tate kinase/homoserine dehydrogenase isozyme of the as- 
partate famity pathway (GWslain et al.. 1994; Galili, 1995). 

To test whether the ~3.5-kb mRNA detected on the RIMA 
gel blot shown in Figure 5 is related to cAt-LKR/SDH, the 
same blot was washed to remove ttie cAJ -SDH prc*>6 and re- 
hytxidized with the putative LKR domain of cAt-LKR/SDH. As 
shown in Figure 8, this hybridization detected the '-S.S-kti 
mRNA band coo-esponding to cAt-LKR/SDH but not the 
~1 3-kb mRNA band cortesponOng to cAt-SDH. 

To d«enrtne fufttwr whether the r*-fermrial part of cAt- 
LKTVSDH encodes an LKR enzyme, the entire codmg se- 
quence of this cOMA was sutx^oned hito the tactertal expres- 
sion vector pET-i5b and used to transform £ coli cells. 
Bacterial cells hartwring this piasmid had SDH but no LKR 
activity (data not shown). Because bacterial c^s did ncA pro- 
duce an active LKR. we attempted to expr«s the Arabidopsis 
LKR. pKAeki in yeast cells. Yeast has a monofiinctional LKR 
enzyme, so v« subcloned the N terminus of the presumed 
LKR domain of cAt-LKR/SOH into the yeast expression vec- 
tor pVT-102u and transfonned this piasmid Into the yeast 
Lys1 mutant. As shown in Figure 9, yeast cells hartXMing this 
piasmid have significarttly higher LKR activity than do control 
c^l3 transfonneel wWi the same ptesmid without the LKR 
insert, thereby conflmiing our supposition that cAt-LKR/SDH 
indeed enooctes a bifuncttonal LKRi'SDH enzyme. 



Orgartawtlon of the LKR and SXm Ctenes ki AraWdcHMis 

Based on the CffJA sequaice identity between cAt-SDH and 
the 3' haEf of cAt-LKRi^DH (cf. Figures 2 and 6) and the 
presence of two mRNA species, corresponding in sizes to 
both cAt-SDH and cAt-LKR/SDH (Figure 5), we wanted to 
detennine whether these two cONAs are clustered within a 
single locus. To investigate whether the two cDNAs were 
cterived from a single gene. Arabidopsis DNA was digested 
with several restifction wizymes. fractionated t)y agarose gel 
electrophorrois, and h^eWzed on ONA gei l>l«s by using 
the SDH cDNA as a fffobe. After 1 week of autoradiography, 
the membrane was stripped and retiybridized with the LKR 
probe. As lilustrated in Figures 10A and 108, a comparison 
of the two autoradiographies shows that a signal appeared 
at exactly the same position when digested with both EcoRI 
and BamHi. These resulte suggest that the cAt-SDH/LKR 
and cAt-SOH are derived from a single gene. In the Hindlll 
and Bglll digests, the LKR and SOH probes highlighted dif- 
ferent bands, apparently because the Arabldopsis LKR/SOH 
gene has a number of introns (G, Tang and G. Galili, unpub- 
lished data) that contain single or multiple sites for some of 
the restriction enzymes used for digestions. 
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Primer Extension Analysis of the Monofunctkmal 
SDH mRNA 

To characterize further the putative monofunctionai SDH 
mRNA observed on the RNA gel blots, we synthesized a 



26-bp antisense DMA primer hotnologous to a region located 
20 nucleotides downstream of the ATG translation initiation 
codon of cAt-SDH (Figure 2, nucleotides 75 to 100). This 
primer was then hytjridlzed with total RNA from Arabidopsis 
flowers, and the hybrid molecules were used as templates for 
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Th6 top fine indicates the amino acid sequence o( the LKR domain 
o( the cAl-LKWSDH; the bottom line is that of the yeast LKR (LYSI). 
Identical sartrto acids ate indicated t>y bars: highly similar amino acids 
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single dot. The asterisk indicates (he protein termination site. 



reverse transcription in a primer extentSon reaction. As shown 
In Figure 1 1 , this reacMon generated a DMA bar>d of 54 nucleo- 
tides thai extended approxinriately five nucleotides up- 
stream of the i::At-SDH translaton initiation ATG cocton. 



Arabidopsis Cells Contain BifimcUonal UOVSDH and 
Monofut)ctional SDH Isozymes 

To detemtme whether the two mRNAs derived from ttie Ara- 
bidopsis LKH/SDH gene were functional in translating bi- 
functional LKR/SDH and monofunctional SDH isozymes, we 
partially purified LKR and SDH from an Arabidopsis ceil cul- 
ture by using an anion exchange column, aftar polyethylene 
glycol (PEG) fractionation. As shown in Figure 12, elution 
from the anion exchange column reserved two distinct SDH 
peal<s. The first was eluted at ~90 mM KQ and contained 
only SDH activity, whereas the second peak was eiuted at 
'--190 mM KCI and had both SDH and LKR activities. The 
level of SDH activity in the peak that did not show LKR ac- 
tivity was -^3.5-fold higher than the level in the peak con- 
taining both coeluted SDH and LKR acflvHies. Moreover. 



under the excess substrate concentrations that were used in 
the enzymatic assays (D. Mnon, S. Ben-Yaacov, D. Reches, 
and G. GaWi. manuscript in f^eparatoi), LKR activity in this peak 
was approximately fourfold higher than was SDH activity. 



In Smt Hybridization wWt the SDH and iJM\ mRNAs 
as Protjes 

We have shown that cAt-SDH mRNA is expressed to a high 
level in floral tissues of Arabidopsis (Figure 5). To determine 
whether the expression of iDoth LKR/SDH and SDH mRNAs 
in Arabidopsis tissues is subject to developmental regula- 
tion, particularly in reproductive organs, we used LKR/SDH 
R(MA probes for in situ hybridization analysis of Arabidopsis 
flowers and seeds. Digoxigenin-labeled RNA probes from 
both LKR (Figures ISA, 13D. and 13G) and SOH (figures 
13B, 13E, and 13H) domains of the Arabidopsis LKR/SDH 
cDI>JA were useci in this analysis. As shown in Figures 13A 
and 13B, the LKR and SDH mRNA was highly abundant in 
the ovules and vascular tissue of anther filaments but not in 
poHen grains. In developing and mature seeds, hybridization 
Agnate were found in the embryo (at either the globular [Fig- 
ures l3Gand 13H] or torpedo [Figures 130 and 13E] stages) 
and in the outer layers of the endosperm (Figures 13G and 
13H), No signal was detected in the control sections reacted 
vwth either the LKR (Figure 130) or SDH (Figure 13F) sense 
protoes. The somewhat lower intensity of signal obtained with 
the SDH probe compjuwtj with that of the LKR probe was 
probably due to a lower amount of the SDH probe and pos- 
sibly the hjwer incorporatton of digoxigenin dunng in vitro 
transcription used during hybridization. This rasuK indicates 
that the eixpresslon of both SDH and LKR/SDH genes is rag- 
tdated in a tissue-specific manr^ during plant development. 




Figure & RNA Gel &ot Analysis of cAt-LKa'SDH. 
The same blot as shown in Fi^e 6 was stripped to remove the cAt- 
SOH probe and rehybfidized, with the LKR coding tegion of cAt- 
LKR/SOH as a probe. Lane l contains RNA from ceil cultures; lane 
2, young seedlings; lane 3. floral organs; lane 4, leaves: lane 5. 
stems: and lane 6. roots. I.ane 7 is the same as lane 3 shown in Fig- 
ure S containing RNA from floral organs hytxidized with cAt-SDH as 
a pnAe. Tlie migration of the 18S and 28S rRNAs is shown at left. 
The positions of the monofunctional SDH mRNA ( ^ 1.5 KW and t»w 
bifuTKfional LKfVSDH mRlviA (--3.5 kb) are shown at right. 
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R^i* a The LKR Domain cAi-LXtV$DH Encodes a Function^ 
LKR En2yme in Yeast Cells. 

The putative LKR Oonnain of cAt-SDH was subetoned Into a yeast 
expression vsctor and transformed iMo yeast. Prets«i extracts from 
two dmerent yeast colo(«ea (martwd 1 and 2) hartBoring the plasmid 
cont^ning the AraWdc^ im, as we« as control yeast c«l« trans- 
foriTMd v«ith the expresstwi vector wWiout the insert, were then ana- 
lyjsed for LKR activity in reactior^s containing ( + lysine) or lacl<ing 
(-lysine) the sutjsfrate lysine, Letters atjove error isars represent sig- 
nificant ditfwenoes at the 5% leval. as d«Hermined by an anova 
test. Each histogram is m average of three separata activity deter- 
minations :±SE, 



DISCUSSION 



Arabicto|»is Contacts Bifunetional LKR/SMi and 
Monofunctional SDH Isozymes, Which Be 
Derived from a ^ngle Gem 

This report describes the cloning of LKR and SOH cDNAs 
from Arabldopsis artd shows that Oie structured and regula- 
tory aspet^ of LKR and SOH In plants are mudi more com- 
plex than what has been previously elucidated for yeast and 
mammals ^hattadiarjee. 1985; Feller et al., 1994). To date, 
either angle LKR or SOH (yeast and rat) or bifunctional LKW 
SDH (human, tiowrte, maize, and soybean) has been shown 
to exist within a given species; howevw, in this study, we 
show that Arabidopsis ceils contain two Isozymic peaks, as 
deduced from anion exchange chromatography. One of these 
peaks contains both LKR and SOH activities, whidi presum- 
ably are located on a bifunctional potypeptkJe encoded by 
cAt-LKR/SDH, and the othw contains only SDH activity. Al- 
though a bifunctional LKR/SDH enzyme has treen reported 
previously m maize, our results show that p>iant cells may also 
contain a monofunctional SDH. In fact, we have recenBy puri- 
fied the SDH protein (shown in Figure 12 as the first SDH ac- 
tivity peak) to homogeneity and found that it is a 53-kD 



protein, in agreement with the expected ^ze of a monofunc- 
tional SOH (data not shown). 

C^r results also strongly suggest that these two isozymes 
of LKR/SDH and monofunctional SDH are translated from 
two distinct mRNAs, which are produced from a single 
gene. We reac^ied this conclusion based on several lines of 
evidence: (1) detectron of two mRNA bands with the ex- 
pected sizes of the isozymes ('• A .5 and '^3.5 kb) on RNA gel 
blots hytmdized with the monofunctional SDH cDUA as a 
probe under high-stringency conditions; (2) the presence of 
an in-frfflne "plant" ATG consensus codon at the initiation of 
the SDH coding sequence (as deduced from amirx) ackJ se- 
quence homotogy with the yeast SDH), which also gave rise 
to the production of an active recombinant monofunctiaial 
SDH in bacteria; and (3) DNA gel blot analysis, which sug- 
gested the presence of only a single gene in Arabidopsis 
that hybridized with either the LKR or the SDH domains of 
cAt-LKR/SOH as probes. 



A B 




SDH P'QUe IKH Prc;b 



Figure 10. DNA Gel Blot Hybridization Pattern of cAt-SDH and cA{- 
UWSOH. 

(A) Genomic DNA was digested with several restriction enzymes as 
indicaled atx>ve the gei. Ten microgrons of digested DNA was sep- 
arated on a g^, transferred to a membrane, arxf hybridized under 
higtt-stringency ooncSttons with cAt-SDH. 

(B) The same blot as shown in W was stripped and hybridized un- 
der high-strlr^cy conditions with the LKR domain of cAt-LKR/ 
SDH as proties. 

The mignatkm of the molecular length maricers Is indkurted at left, 
and their fewigths are given in Wtobases. 
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Figure 11. Wmer Extension Reaction of Total Arabidopsis Flower 
RNA with the Aniisense Primer Located 20 to 46 Nucleotides Oown- 
streain of the cAt-SDH ATG Initiation Codon. 

The prtmer wrtensloo (PE) reacUon product is bv^cated »jy an arrovn; 
A. G, C. and T indicate sequertdng ladders of the same primer an- 
nealed to the relevant genomic fragment The sequence wound the 
extended product is Indicated at left. 



The presence of an mRNA encoding a monofunctional SDH 
was also supported by the primer extension analysis shown 
in Figure 1 1 . However, the primer extension bartd was sorter 
than expected, based on the 5' nonctxSng sequence of cAt- 
SDH, and temiinated approximateV five nucleotides up- 
stream of the translation initiation ATG of this cONA. The rea- 
son for the shorter than expected primer extension fragment 
is still not known. However, computer analysis predicted that 
the 5' noncoding region of cAt-SDH may contain a relatively 
stable stem and loop stmclure (data not shown). Experiments 
are now in progress in cxir laboratory to analyze whether this 
region may indeed form stable secondary structures in vivo 
and whether these structures may function in the regulation 
of the LKR/SOH gene expression. Nevertheless, based on 
the primer extension results, we cannot yet affirm wheUier 
cAt-SDH was derived from the mc»Tofunctional SDH mRNA 
or is a truncated form of cAt-LKR/SDH. 



Structural and Funettonal ProperSes of the sminctional 
LKFt/SDH Enzyme 

Amino acid sequence alignment of cAt-LKR/SDH with the 
yeast monofunctional UO^ and SDH isozymes revealed that 
the plant bitunctional orizyme possesses an intemrtedtate 
region between the two enzyme domains that was not 
f»esent in any of the yeast enzymes. Similar intemnediate 
re^ns were also reported fOr other bifunctiorwl enzymes, 
such as bacterial and plant aspartate kinase/honnoserine de- 
hydrogenase (Ks^inowsiu et al.. 1991; Ghislain et al.. 1994). 
The fimctional rote of this mtemiediaie region is sBlf not 



known. However, the fact that the LKR and SOH dwnains of 
the bifuncUon^d U<R/SDH can be directed into ^ngle func- 
tional en^mes (Figures 4 and 9; Markovitz and Chuang, 1987; 
Goncalves-Butruille et al., 1996) suggests that this region 
may enable independent folding of the two domains, tn addi- 
tion, because bifunctiona! LKR/SDH are generally tiornooligo- 
mers (Markovitz et al., 1984; Goncalves-Butnjille ^ al., 1996), 
the Intamediate domain may also function in its ^emUy, as 
was {xwKMisly reported for the bacterial brfunctional aspar- 
tate kin^e/homosarine dehydrogena.se enzyme (Kalinowski 
etal.. 1991). 

Another interesting issue is whether the linkage tietween 
the LKR and SDH domains has a regulatory significance, 
which may result from "cross-tatk" tjetween the two domains. 
Although this issue Is still not solved, our study indicates 
that such cross-talk may indeed occur. Upon fractionation 
on the anion exchange column and analysis under condi- 
tions of excess substrates of LKR arKi SDH (D. Miron, S. 
Ben-Yaacov, D. Reches, and G. Galili, manuscript in prepara- 
tion), the specific activity of SDH in the monofunctional SDH 
p^ was much higha' than that in the bifunctiona) LKR/ 
SOH peak. This difference could not be explained by the dif- 
ferential de^ee of (xjiification of the hivo peaks because both 
peeri«3 contained comparafcrfe levels of tc^al protein. The dif- 
ferences fn SDH aSivity between the two isozymes also 
coukJ not be explained by differences in mRNA levels l>e- 
caose the intensity of the LKR/SDH mRNA band was slightly 
higher than that of the monofunctional SDH mRNA (Figure 5), 
TT)us, although we cannot yet rule out the possibility of v^a- 
tton in transtational efficiency or protein stability, it is tempting 
to hypothesize that the activity of ittay be negatively reg- 
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figw0 12. Ftacfionation of LKR and SOH Activttles from Ar^dop- 
sis Cell Culture on m Anion Exchar^ie Column. 
PEG-fractkmatad ArabiflCHssis cell culture extract was loaded w>to a 
OEAE~Se[rfiarose column, washed, and elutisd with a step gradient 
of 0 to 1 M KCi. The protem level, cajnsluctivttv, and LKR and SOH 
activities in each fraction are presented. mS, milNsiemens. 
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Flflurw 13. In Situ HybrhSzaUon of AnUjHjopsis Flower and S«ed 
Tissues with LKR and SDH Antisense Probes. 
(A), (0). and (G) LKR probe. (A) shows a IcmgltutSnat section of an 
Arabidopsis flower. (D) shows cross-sections ol seeds with a tor- 
pedo-shaped embryo. (0) deplete cross-sei^ons of seeds wnth a 
globular-shaped embryo. 

(B}. (E), and {M) SDH probe. (B) shows a longitudinal section of an 
Arabidopsis flower. ^ shows crosS'Sections of seeds wiOi a tor- 
pedo-shaped embryo. (H) shows cross-sections of seeds with a 
giolsular-shaped embryo. 

(C) and (F) l*sgat!ve controls with UKR and SDH sense |«Jbes. re- 
spectively- (C) shows a longitudinal section of an Arabidop^s 
flower. {F) shows cross-secttons of seeds ViStti a torpedo-shaped 

em, emtjryo; en, endosperm; ov, ovules. 



ulated by Its linked LKR domain. )f irtdeed such a c<mtrQl oc- 
curs in vivo, it is expected ttiat pteit spedes prtiducing only a 
single bifunctiorud LKR/SDH wiH accumiiate saccharop^ 
(the product of LKR and the substrate of SDH; see Rgure 1), 
whereas those producing both isozymes will accumulate a 
downstream metabolite of the catabollc pathway. Inter- 
estingly, whereas lysine-overproducing transgenic soybean 
seeds, expressing a tjacterraJ {fihydrod^piccSnate synth^e. 



were ^lown to «xumulate sact^ian^tine, WansgeiiiG tobacco 
and carHJla expnesang the same bacterfat ecvzyme accumu- 
latod the downstream metaboWB a-aminoadl(^ acid (Faico et 
al., 1995). Whettier the <fifrerenti{y accumulation of saccha- 
Kxpine and <i-amlnoadipic acid in thesa plant spedes is related 
to differfflttiaf expression of the LKRffiDH and isoz^Ties 
still remains to be demonstrated 



Expf«mlon of the UKR/SDH Gane Is DevetopitMntaHy 
Regulated 

Although «» UOVSOH and monofuncttenal SDH mRNAs 
were detected in all tissues tested, their levels varied among 
ttie dlNsmtt tissues. Both mRNAs were significantiy hl;^ 
in toial organs man in vags^ve tissues (Figure 5). In addi- 
tion, in ^ mRNA hybrfdization usbig reproductive organs 
showed that these mRNM were nnost abundant in the ova- 
nes of developing flowers as well as fti the embryos but not 
in the endosperm tissues of developing SBid mature seeds. 
The spati^ pattern of LKR/SDH gene expression fn develop- 
ing flowers and seeds appears very similar to that of the 
Arabidopsis gene encoding the bifunctional aspartate kl- 
nase/bomoserine dehydrogenase that leads to the synthesis 
of lysine as well as threonine, methionine, and isoleudne 
(Zliu-Shknoni et al., 1997). These results support our previ- 
ous hypothesis (Karchi et al., 1934) that expression of genes 
encoding enzymes in lysine biosynthesis and catabollsm 
may be coordinately expressed during plant development. 
We have also previously shown that the presence of excess 
cellutar ^ne caused the stimulation of LKR activity In devel- 
oping tobacco seeds (Karchi et al., 1995). Therefore, it will be 
interesting to test whether the coordinated expresskjn of the 
LKR/SDH gene with other genes encoding enzymes in lysine 
biosynthesis is due to common transcriptional elements in 
their promoters or to a special regulation of LKR/SDH gene 
expression by senang the natively high lysine levels in cells 
in which lysine biosynMiesis is upregulated. 



PoM>Trarmcr«ptkinal ReguKMon of LKR 

The Arabidopsis SDH was active when expre^ed In bacte- 
list cells; however, LKR was not. This was not due to lack of 
expression, bec^ise the LXR^DH constnjct leads to the 
production of SDH but rtot LKR activity in bacteria. More- 
over, the lack of production of active Arabidopsis LKR in 
bacteria was not due to a mutation in its sequence, because 
the sxma DUA pnsduced active LKR when expressed in yeast 
cells. These results suggest that LKR may be activated by 
post'transtatkxial mocfflcatton. vi^lch does not operate m 
proitaryotes. Indeed, we have recently found that the active 
LIO=I enzyme from soybean is a phos^prot^n and that re- 
moval of its phosphate resklue(s) by aflcarme phosphatase 
knocked out LKR activity in vitro (D. Miron, S. Ben-Yaacov. H. 
Karchi, and G. Galili, submitted manuscrfptj. 
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METHODS 



Plant Material 

ArabkSopsIs thaliana var C24 plants were grown In a greenhouse, and 
different tissues were collected from the developing plants fbr the 
Isolation of the total RNA and in situ hyliridlzation. 

The cell culture of Arabidopsis ecotype Landsberg erects was 
kindly provided by M J. May (University of Oxford, Oxford, UK; May 
and Leaver, 1993). This culture was grown in MSMO liquid medium 
(Sigma), pH 5.7, containing 3K sucrose, O.OS mg/L kinetin, and 
0.5 mg/L naphthaleneacetic acid. The culture was placed on a rotary 
shaker at 1 1 0 rpm at 22°C in continuous fluorescent white light. 



Cloning of the Full-Length cAt-LKR/SDH and cAt-SDH and 
Subdoning Them bito Expression Vectors 

The expressed sequence tag (EST) ck>ne 23A3T7 and the \ZAP 11 
cDNA library (Kleber et al., 1993) were kindly provMed by the Arabi- 
dopsis Biological Resource Center (Columbus, OH). To done the full- 
length CAt-LKR/SDH from the X2AP II library, the cDNA from the EST 
clone was used as a probe to screen the library, as previously de- 
scribed (Sambrook et al., 1 989). The plasmid containing the full-length 
oAt-LKFVSDH was excised from the XZAP II by using a helper phage, 
and its DNA sequence was determined by an automatic sequencer 
(model 373A. version 1.2.0; Applied Bkjsystems, Foster City, CA). 

For expression of the putative monofunctional SDH In bacteria, an 
Smal to Xbal DNA fragment containing the entire coding sequence of 
cAt-SDH was subcloned by a translational fuston Into EcoRI (blunt 
ended with the Klenow fn^ment of DNA polymerase I) and Xbal sites 
of pUCia. For subctoning into the bacterial expression vector pET- 
1Sb, the coding sequence of cAt-SOH was excised with Xbal (blunt 
ended with the Klenow tragment) and Sail and subcloned as a trans- 
latfcmal fusion into the BamHI (blunt ended with the Klenow Itagment) 
and Xhol sites of pET-1Sb to form the plasmid pET-1 Sb-SDH. 

For expression of the LKFVSDH sequence in bacteria. cAt-LKFV 
SDH was digested with EooRI, which cleaves immediately alter the 
LKR translation Initiation codon (ATGAATTQ. The plasmkj was then 
blunt ended with the Klenow fragment, digested with Nhel. which 
cleaves in the SDH domain, and subcloned into the Ncol (blunt 
ended with the Klenow fragment) and Nhel sites of pET-15b-SDH. 
resulting in the piasmid pET-15b-oAt-LKR/SDH. 

For expression in yeast, pET-LKR/SDH was digested with Xbal. 
which cleaves immediately upstream of the LKR translatton initiation 
ATG codon, and Pstl, which cleaves in the SDH domain. The insert 
was then inserted Into the Xbal and Pstl sites of pVT-102u, resulting 
in the plasmid pVT-102u-IJ<R. 



Production of Recombinant Proteins In Bacteria and Yeast 

The expression plasmids were transformed into Escherichia co* 
(Sambrook et al., 1989) and yeast cells (Ito et al., 1983) by using gen- 
eral heat shock and LiOAc transformation methods, respectively. 
Transformed tiacterial cells w/ere grown to mid-exponential phase 
(Agoo of ~0.5 to 0.8) and then induced with 0.4 mM isopropyl p-D- 
thlogaiactopyranoside for an additional 4 hr. Transformed yeast cells 
Mutant 8973b trom A. Pien^e. University Ubre de Bnjxeiles. Bmsseis. 



Belgium; Ramos et al.. 1986) were grown to mid-log phase in liquid 
SC medium (Shennan et al., 1983) lacking uracyt. 



Processing of Bactwia Mid Yeaat fbr Analyais of U(R and 
SDHActhritles 

E. eo« oSHs were precipitated, dissolved in one-tenth of buffer A (25 
mM potassium phosphate buffer, pH 7.5, containing 1 mM EDTA. 1 mM 
DTT, and 1 0 \ug/wL leupeptin), and sonk»ted. The total lysate was pre- 
cipitated at top speed (16,000?) in a tabletop centrifuge for 10 min at 
4°C. and ttie supernatant was used for activity assays. Yeast cells 
were precipitated, redrssoived in one-tenth of buffer A. and broken 
by vortexing with glass beads for half an hour at A°C. The lysate was 
precipitated again, and the supernatant was used fbr activity assays. 



DNA Qel Blot Analysis 

Extractwn of genomic DNA was performed according to the proce- 
dure in Sambrook et al. (1989). DNA samples (10 (ig) were eiectro- 
phonesed in a 1% agarose gel and transferred to a Hybond N + 
(Amersham) nylon membrane. The blots were hybridized tor 12 to 
16 hr at 65°C with ^p-iabeled probes containing either the LKR or 
SDH domain of cAt-LKR/SDH. Hybridization was perfomwd In 5 x 
SSC (1 X SSC is 0.15 M NaCl, 0.015 M sodium citrate), 5 x Den- 
hardl's solution (1 x Dsnhardt's solution Is 0.02% Ficoll, 0.02% PVP, 
and 0.02% BSA), and 1 % SDS. Blots were washed twice for 10 min 
at eS-C In 1 X SSPE (1 x SSPE Is 0.1 5 M NaCI, 1 0 mM sodium phos- 
phate. 1 mM EDTA. pH 7.5) and 0.1% SDS, followed by another 
wash In 0.1 X SSPE and 0.1% SOS. Radioactive bands were de- 
tected by autoradiography. The hybridizatton probes included either 
the 1454-bp Sall-Ndel fragment of cAt-SDH (SDH probe) or a 771-bp 
Notl-Hindlli fragment trom cAt-LKR/SDH in pBluescript SK- (Strat- 
agene, La Jolla, CA; LKR probe). 



RNA Qel Blot Analysis 

Total RNA was extracted from vaious tissues by using Tri-Reagent 
(MHC. Inc.. Ondnnati, OH), according to the protocol provided by the 
manufacturer. RNAsampias (20 |ig) were electraphorssed in a 1 % aga- 
rose gel containing 2.2 M formaldehyde and 50 mM 3-(N-morphollno) 
propanesuHonlc acid, pH 7.0, and transfen'ed to a Hybond N nylon 
membrane. Probe utilization, hybridization, and washing were as de- 
scribed above for the DNA gels blots. The migration of the 28S and 
18S rRNAs was visualized by ethkfium bromide staining of the gel 
before transfer to a membrane. 



Partial Purification of the LKR and SDH ftam Arabidopsis 
Cell Culture 

A 1 - week-old cell culture was filtered, and the resulting cell pellet was 
frozen in liquM nitrogen and kept at -80°C until used. For purifkation, 
the frozen pellet was ground with a mortar and pestle and then 
homogenized using an Ultraturax (Ystral GmbH, Dottingen. Germany) 
in an equal volume of buffer A. After oentrifugatkjn at 25,000g for 1 5 
min. the pH of the supernatant was brought to pH 5.6 with solid 
KHjPO, and then fractionated with polyethylene glycol (PEG) 8000 
behveen 7 and 14%. After fractlonafton virtth 14% PEG. the pellet was 
resuspended bi one-tenth the Initial volume of buffer A and loaded 
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onto an anion exchange DEAE-Sepharoea column (Pharmacia). After 
washing the untx>und protein, the column was ekjted wHh a step gra- 
dient of 0 to 1 M KCI in buHar A. 



Analysis of LKR and SDH ActMUes 

The kinetics of LKR activity was measured spectrophotometrlcaily by 
determining the rale of l«JAOPH oxidatkxi at 340 nm for 10 min at 
3ffC. The activity assays Included SO |ig of protein extract in 0.3 mL 
of 0.1 M TTis-HCI, pH 7.4, 20 mM lysine, 14 mM a-l(etoglutarate, aid 
0.4 mM NADPH. Each reaction also included a control taddng the sub- 
strate lysine. One unit of LKR was defined as the amount of enzyme 
that catalyzes the oxidation of 1 nmol of NADPH per min at acre. 

The kinetics of SDH activity was measured spectrophotometrlcaily 
by detennining the rate of NAD* reduction at 340 nm for 10 min at 
sore. The activity assay included SO iig of protein extract In 0.3 mL 
of 0.1 M Tris-HCi, pH 8.5, 2 mlVI sacchwopine, and 2 mM NAD''. 
Each reaction also included a control lacking the substrate saccha- 
ropine. One unit of SDH was defined as the amount of enzyme that 
catalyzes the reduction of 1 nmd of NAD-" per min at 30°C. 
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Protein Determination 

Protein levels were determined by the method of Bradford (1 976), us- 
ing the Bio-Rad protein assay kit. 



In SMu Hybridization 

For preparation of the hybridization probe, the LKR and SDH do- 
mains of cAt-LKR/SDH were subcioned separately into the pBlue- 
script SK- plasmids. Digoxigenin-labeled sense emd aniisense 
probes were obtained by in vitro transcription using the digoxigenin 
RNA labeling kit (Boehringer Mannheim). Tissue preparation and in 
situ hybridizatran were conducted as described by Drews (1995). An 
antisense probe and the corresponding sense control probe were 
used in each experiment. 



Primer Extenakm 

Primer extensnn analysis was performed according to Sambrook et 
al. (1 989). with several modlTications. Total RNA (1 0 m) from ftowers 
was mixed with ^^p-end-labeled antisense primer located 20 to 46 
nucieotkjes downstream of the transcription initiatton ATQ codon of 
cAt-SDH. The reactkxt was then incubated at 80°C fOr 10 min and 
cooled stowly to room temperature for annealing. Reverse transcrip- 
tion was conducted at 42°C for 1.5 hr. The reaction was stopped by 
boiling for 10 min and cooling on tee, and the mixture was ttien 
treated with RNase free of DNase for 30 min at 37°C. After ethanol 
precipitation, the primer extenston product was analyzed on a se- 
quencing gel along with a sequencing ladder of the same primer an- 
nealed to the relevant genomic fragment. Radioactive bands were 
detected by autoradkigraphy. 



Computer Analysis 

DNA sequence analyses were performed using the Genetics Com- 
puter Group (Madison, Wl) software package (verston 8). 
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one member; and (c) a careful annotation 
of functional features that avoids the pitfalls 
descrilx^ above. 



UPFs emergi 
gave high pi 



Protein 
annotation: 
detective work for 
function 
prediction 



Computer analysis of genome sequences is 
currently one of the essenlial steps for 
obtaining functional and stractural infor- 
mation about the respective gene produtts. 
Database searches are used to mnsfcr 
flinaional features from annotated proteins 
to the query sequences. With the increasing 
amount of data, more and more software 
robots perform this task'. While roboB are 
the only soludoii to cope with the flood of 
data, they are also dangerous because diey 
can cunt;ntly introduce and propagate mis- 
annotalions^'^. On the one hand, functional 
information is often only partially transferred 
(underpttxiiction). For example, infonnation 
is not usually extracted for each functional 
unit (protein domain) but just taken from 
the one-line description of the best data- 
base match (.so multifiinctionality is rarely 
considered). On the other hand, oveipre- 
dictions are common because die highest- 
scoring database protein does not necessarily 
share the same or even similar functions. 

Definition and coUectlon of 
uncharacterlzed protein families 

To avoid unnecessary propagation of 
poor annotation, we have cxjUected puta- 
tive, pooriy annotated proteins that are usu- 
ally labeled as 'hypothetical' or just as 'ORF' 
(open reading frame). We operationally 
ddined uncharacteiized protein bmilies 
(UPFs) to be families of proteins that: (1) 
contain members in at least three taxonomi- 
cally distinct (and phylogenetically 'di.stant') 
species; and (2) do not contain (to the be-st 
of our knowledge) biochemically charac- 

A collection and classification of these 
proteins shoukl allow: (a) utilization of femily 
information and thus a moic detailed char- 
acterization; (b) sirqilification of update pro- 
cedures for the entire families if functional 
infonnation becomes available for at least 



projects progress, more and more of these 
UPFs emerge in sequence databases. 'We 

3 families that contain 

two of the three major 
kingdoms (aichae, eubacteria, eukaryotes). 
The original 'family' definidon was based 
on signilicaiit hits in the statistics provided by 
FASTA (Ref . 4) or gapped BLAST (Rcf. 5). 

Annotation of UPFs in SWISS-PROT 
and PROSFTE databases 

A serial number has been assigned to 
each UPF and to each of the corresponding 
SWISS-PROT (Ref. © entries. A SWISS-PROT 
document file lists all the current UPFs and 
their members in SWISS-PROT. This docu- 
ment is available on the WW (Ref. 7). In 
the majority of cases, PROSITE entries' have 
already lieen created to document die 
respective family. Whenever a member of a 
UPF family is biochemically characterized, 
that family ceases to be considered as a UPF 
and is deleted from the list. However, infor- 
mation is provided that allows its history to 
be tnited. For example: 

Family: UPF-0002 IDFLETED] 
Taxonomic range: Eubacteria 
Comments; Now charaaerized as a 
lily of pseudouridylate .si'ndiases 



CEC4. 



1.70). 



Prototype: RSU.^.ECOU (Accession No. 
P3.W18) 

PROSITE entry: PDOC00885 

Function prediction for tlie UPFs 

The annotation is handled radier con- 
servatively (see below) because functional 
oveiprediclions are most dangerous given 
the many opportunities for error propa- 
gation in sequence database^ *. Nevenhe- 
less, we intended to rL-crieve a.s many func- 
tional features as possible for each UPF 
using comparadve analysis, ITius, each 
UPF was subjected to a variety of sequence 
analysis methods^. In brief, several mem- 
bers of each UPF were compared with a 
database of non-identical protein sequences, 
daily updated at the EMBL using PSI-BLAST 
(Ref. 5) with a conservative expected ratio 
of false positives (E = 0.001) as a threshold 
for each iteration. Sequences were pre- 
processed by filtering for transmembrane'" 
and coiled-coil regions". A multiple align- 
ment was constructed for each UPF using 
OustalX (Ref. 12). If PSI-BLAST did not iden- 
tify a relationship to characterized proteins, 
odier iterative methods sucli as Wisetools 
(Ref. 13) and Mast (Ref 14) were applied. 
They also use kmily information, that is, 
give more \i,<eiglit to conserved pasitions 
and so on, but have the advantage that the 
underlying multiple alignments can be 
checked and improved manually (on the cost 
of speed and the 'easy to use' feature). 
Finally, all searches were repeated using 



sequences from entirely sequenced genomes 
to reduce noise effects'.". For example, 
PSI-BLAST E-values depend on die database 
and a database match might be significant 
iLsing a small dataha.se but becomes insignifi- 
cant if more background noise (unrelated 
or redundant sequences) is added. 

In many cases, the iterations revealed 
the relationship of die UPFs widi odier pro- 
teins, families or supetfamilies. As the main 
focus here was to assign functional features, 
die iterations have not been continued when 
a reasonable prediction could be made. 
Criteria for die latter were matches to known 
aaive site patterns or consen-ed motifs 
resembling tliose in PROSITE as well as 
positioning of UPF members widiin phylo- 
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identified in 13 (22%) of the 58 UPFs, 
although functional predictions for diese 
13 have not been made. Of die remaining 
43 UPFs, 25 could be related to proteins 
widi annotated lunctional features (Table 1). 

Pitfalls in function assignments 

The predictions required carefiil inspec- 
tion of die functional annotations of die 

difficulties, Table 2 shows die resuk of a 
Bbst search for UPF0002 diat includes quite 
a few proteins widi annotations (in addition 
to die first hits that aj c labeled as 'hypodied- 
cal'). Only one can give a clue about func- 
tional feawres; others are simply wrong, 
mi.sleading or uninformative. 

Another typical as.signment enxir is 
caused by the sequence similarity of die 
query to a region diat Ls independent from 
die one diat vds die basis for die annotation. 
For example, die hypodietical protein HI0722 
(Aa-ession No. P44842, ID: 'VIGZ.HAEIN), 
a member of die UPF0029 family, shows 
significant similarit>' to two proteins (Gen- 
Bank entries gil2314657 and gil2688341) in 
Hclicobaclerp}hri3nd Borwlia burgdorferi, 
respectively, which arc WTongly annotated 
as proline dipepudases (pepQ). The anno- 
tation is based on the N-temiinal homology of 
these two proteins widi die C-tenninal re- 
gion of proline dipeptidase (pep® (giM2358) 
of £. coAl which does not haibor die catalytic 
activity of thui enzyme. 

There were even examples in which 
homologs scored best in PSI-BLAST (Ref. 5) 
diat did not have the same catalytic activity 
because active site residues of die charac- 
terized family were not conserved. How- 
ever, diere were significantly lower scoring 
homologs with petfett matches of their 
(distinct) caalytic site residues to die query. 
For example, die i;PF0046 family has clear 
amino add .similarity to proteases diat are 
easily found by PSI-BLAST (Ref. 5) in die 
fouidi iteration; yet, re.sidues involved in 
metal-binding are only shared widi a purple 
add phosphatase family diat is only picked 
up in die nindi iteration. The E-value of 
le-5 compared widi proteases (E-value of 
5e-78) remain considerably higher in sub- 
sequence iterations. Sucli instances have 
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implications for cunent function prediction 
programs in which the function of the best 
hit is transferred. Cieariy, another gener- 
ation of methods is required that include 
checks for the presence of functionally 
important residues. 

Use of phylogenedc trees 

As most of the database proteins with 
functional annotations were only distantly 
related to members of the UPFs, transfer of 
functional information is extremely difficult 
iry. The majority of UPFs turned 
s, and based on 
ive site residues 
le that at least the basic cata- 
lytic mechanism remains the same. This, 
however, is of little predictive value as some 
families, e.g. those with the a/fi hydrolase 
fold collected in SCOP (Ref. 1© are huge and 
harbor numerous distinct catalytic activities, 
such as lipases, esterases, dehalogenases, 
pepdJascs, iicroxidases and lyases. We have 
therefore constructed phylogenetic trees of 
selected members of die UPFs and of 
related, but disUnct families that have been 
identified during the analysis (Fig. 1). On 
some occasions, the UPF members clearly 
clustered with proteins that all peiformed 
the same function (Fig. la), but in most of 
the cases the UPFs were of equal distance 
to distina enz>matic activities (Fig. lb), thus 
not allowing any detailed predictions. 

Although the studied protein families 
were bound to be difficult for function 
predictions because a considerable num- 
ber of teams were unable to find functional 
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Tabu 1. Predicted functional features for 25 VPBa 
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(Saccharomyces cere\nsiae) >gi 1642221 (Z21618) 
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Annotations that hamper funoional predictions illtisuiated by the example of the UPF0002 family. Based on die recent experi- 
mental characterizatkm of pseudouiidylate synthase'^ this family has been deleted from the UPF list (see text). Neveidieless, &ie 
various, partly contradictory annotations (bold) are extremely difficult to parse for automatic function prediction programs. 
For brevity, the PSI-BLAST results have been cut (. . .). 
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- GCAD.BACSU 



- GLMU_ECOLI 




PIP.LACHE 



Nucleic Acids Ses. 26, 38-42 

7 httpd://www.expasy.ch/cgi- 
bin/lists?upnist.txt 

8 Bairoch, A., Bucher, P. and 
Hofmann, K. (1998) Nucleic Acids 
Res. 29, 217-221 

9 Bork, P. and Gibson, T. (1996) 
Methods Enzymol 266, 162-184 

10 Von Heijne, G. (1992) /. Mol. Biol. 
225, 487-^94 

11 Lupas, A., van Dyke, M. and Stock,;. 
(1991) Science252, 1162-1164 

12 Thompson, J.D. et al. (1997) Nucleic 



FIGun 1. (a) Phylogenedc irees of selected members of UPF(K)07 that Indicate a likely 
function as UPF0007 members with cytidyltransferase activities (red) and related 
uridilyltiansferases (blue) are more divergent (*pir database entry, pirlg64l56; **pir 
database entry, pirls49238). (b) No dear enzymatic activity can be predicted for UPF(»17 
members: They dearly have the hydrolase fold but have equal distance to peroxidases 
(red), esterases (j,'reen), pepUdases (blue) and other hydrolases (pink) ("•GenBank entry 



gil 1001804). The tn 

features therein, it is noteworthy that there 
was not a single case in which we were 
able to piedict the precise mechanism and 
the substrate specificity. Nevertheless, the 
infomiation about an enzymatic activity and 
the likely reaction mechanisms of the 25 
UPFs should prove useful for the analysis of 
upcoming genome sequences. 

Annotattoa with the right level of 
precision helps In future projects 

In summary, we were able to provide 
some functional annotation for mote than 
700 of about 1300 proteins clustered in 25 
of the 58 distinct UPFs. Most of diem are 
currently named 'hypothetical procein' so 
that dieir annotation adds enormous value 
to these sequences. For another 13 UPFs 
currently containing about 250 proteins, 
the presence of transmembrane regions 
was recorded. This annotation is now being 
incorporated into PROSITE and SWISS-PROT 
so that these feamres can be assigned to 
newly sequenced genes as well. 

Tile difficulties we faced in assigning 
functions by sequence similarity also indi- 

by most of the software robots are probably 
erroneous. Because of the current policies of 
most of the setjuence databases, correction 

there should be a combined effort by the 
database teams, the audiors of the current 
entries, and the communicy, to work towards 
a careful functional annotation of all die 
sequences that become publicly available. 



calculated using aUSTAlX (Ref. 12). 
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The challenges of genome sequence annotation 
or "The devil is in ttie details'' 

Temple F. Smith andXiaoHn Zhang 



T\fO powei-fuL competing pressures ate act- 
ing on various genome sequencing projects: 
One, to release new sequences as quickly as 
possible; and two, to provide them with 
maximally complete and accurate annota- 
tion. This rather incongruent combination 
has led to a strong interest in developing efG- 
cient and accurate automated, lai^-scak 
sequence annotation procedures. 

There have, in fact, been a number of 
attempts in both Industry and acaderaia to 
speed new sequence annotation. In their 
simplest form, these have been little more 
than post-processors acting on standard 
high-speed sequence similarity search tools 
such as BLAST. The post-processing assigns 
the annotation from die best-matched previ- 
ously known sequence to each new sequetKe. 

Thut is, of course, a generalization of suc- 
cessful approaches used by many researchers 
to assign probable functions to new 
sequences when previously studied and rec- 
ognizable homologs exist. However, when 
applied in an automated manner to large 
data sets with minimum review, such 
approaches can lead to seriou.s degradation 
of the wealth of incoming genomic data. 

There are more problems with the simple 
best match functional annotation inheritance 
(BMAl) than the two traditionally recognized, 
those being the assessing of biologica] signifi- 
cance in terms of match statistical significance, 
and the choice between the sensitivity of the 
very feat, but approximate, sequence similarity 
search algorithms and the mathcmaticaUy r^- 
orous, but much slower, optimal a^oridtms. 

tn the first place, it is easy to assign various 
measures of conAdenc« to new annotation 
based on match statistics, and there is good 
evidence that apprmdmatt maximum similar- 
ity tools such as BLAST do nearly as well as 
any of the slower, fiill dynamic prognunming 
mohods. Second, the newer versions of 
BLAST have high seisitivity, identifying local 
sequence pairvnse similiuritks. including 
alignment gaps. The inclusion of alignment 
g^ was one of the mam advant^es of the 
slower dynamic programming methods. 
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No, the major problenu associated with 
nearly all of the current automated annota- 

database annotation inconsistencies (and a 
few outri^t errors). This is particulariy true 
for the large and often complex protein fitmi- 



Some inconsistencies are simple, such as 
the reference to tRNA syitthetases in fungi as 
tRNA ligases (which of course they are) or the 
use by Americans and most fun^Hsans of 
dihydroxyacetone-P for a glyu)iytic interme- 
diate diat the Japanese and English generally 



lies. Why are dhese the major problems, call g^ycerone-P. There ate many c 



rather than the two more obvious ones pre- 
viomly mentioned? 

Qearly, for researchers studying a particu- 
lar protein family, most database annotation 
inconsistencies make little diiference in the 
search for new, even distant members. A local 
expert either knows the range and/or history 
of the annotation terminology used by col- 
leagues in difiteient subfields, or perhaps 
more impwtanUy, the expert will spend Ac 
time to bfldctradi af^arent inconsistetKies. 

Even in those cases involving structurally 
complex proteins composed of multiple 
domains, all of which may not be fully or 
properly annotated, the expert generally 
carefully dissects matches to distinct 
domains, and backtracks each domain's 
annoutions. However, in the large-scale 
genomic projects, having a local expert to 
work on each protein femily is not an t^tion. 
Yet the integration of genomic information 
across multiple protein fiunilles, mv^pk 
fields of expertise and 
taxa, is just what is envi- 
sioned to form tlie foim- 
dations of the next 
century's biology and 
biotechnology. 

The ba^c problem of 
inconsistent nomenclature 
arises largely because 
sequence information and 
its annotation derives from 
many diverse subdivisions 
of the biological sciences 
during a time of rapid 
dtange in otur understand- 
ing, bi an emerging field 
such as moiectihr biology, 
let alone "companrtive 
geitomics,'' strict^ con- 
trolled vocabularies would 
not only be difficult to 
impose, but are probably 
undesirable! The evolution 
and refinement of the 
vocabulary is an anticipat- 
ed 



equtvaient, but difiSnent, terminology. For 
example, in the well-studied G protein case, 
among 27 distinct G P-subunit CenBank/ 
SWI.SS-PROT entries, there are 18 different 
protein names or keyword sets. A list of syn- 
onyms can be constructed in such cases, 
some of which will be species or field specific. 

There are numerous cases in whidi pro- 
teins of very different current ftjnctions arc 
homolt^ous in that they evolved from a 
OHnmon ancestor and will match with .sig- 
nificant sequence similarity. For example, 
numerous proteins sharing multiple WD- 
repeats have been labeled transducin-like or 
transducin homologs, yet share no common 
signal transduction function'. The rather 
widespread improper use of synthetase for 
synthase and the converse, however, catmot 
be fixed by a thesaurus, since whether the 
enzyme in question requir« ATP or not is 
not a matter of altwnate terminology. With- 
out the carefol use of synonym tables in 




F^tm 1. An txampi* of s«nM having th* potential for 
annotation inhmitance transitivity. Th« three two-dom^ 
pratains, 61282, YKL211C, and b1263, share no sinflte 
Domolna are MM«ct l>y colors: red, N- 



increasing knowledge. 
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con^ination with review of commonly mis- 
used terminology, any simple BMAI 
approach will often end up propagating the 
less desirable or erroneous annotations. 

Random propagation of £iulty annota- 
tion, however, is only the tip of the annota- 
tion problem iceberg, in the case of 
multidomain proteins, most simple BMAI 
approaches will at best annotate only the 
most similar of the domains, and at vrorst 
wiD attach the annotation of a nonshared 
domain from the matched proidn. 

The first of these, incomplete annotation, 
is seen in the recently released Eschericia colt 
genome data for ORF bl262, a 453-residue, 
multifunctional protein'. Here, the first 253 
amino acid residues comprise the indcde-3- 
gtycerol phosphate synthase domain. This 
matches single-domain homologs In 
Methanococcus jannaschii and Bacillus subtilis 
and the carboxy-terminal domain of the pro- 
tein product of one yeast gene, YKL21 IC. The 
second domain of the £ adi protein residues 
259 through 443 matdi«»the Af-phosphortbo- 
syl anthranilatc isomerase, single-domain 
protein in M. jannasMi S. subtUis, and yeast 
(and this Ainction i« currently imannotated). 

An incorrect inheritance via a matehed 
multidomain protein is seen in the M. jan- 
naschii ORF pair, MJ0234 and MJ0238. Both 



match the £. cofc ORP bl2«3, a blfunctional 
enryme of two separate domains. Both M. 
janHOschii genes have been wnotaled, how- 
ever, by only one of the two Aioctions: 
anthranihte synthase subunit II, vAuch is 

What must lie done to 
avoid continued annotation 
inconsistiMicy, incom^Me- 
ness, and erroneous 
propagation? 



associated only with the first 176 of bl263"s 
53 1 amino acids, and that region is matched 
onlybyMJ0238(Fig.l). 

What must be done to avoid continued 
annotation inconsistency, incompleteness, and 
erroneous pn^>^tion? First, any automation 
must be rather sophisticated. It must, (or a 
start, recognize large differences in the Ici^gdi 
of matching sequences; it must assodate anno- 
Ution witii specific subsequences; it must rec- 
ognize all differences among the annotations 
of the homologs to the matched sequence; 
and, whenever possible, sequence nroilarity 
should be identified via shared conserved 



caidiiUy annotated, consistent with the entire 
fiunHy duracterized by that pattern. All 
fqjpiMChes should exploit the l^t available 
synonym tables such as those available 
throu^ resources like PROSITE, the Enzyme 
Commission, or the US National Library of 
Medicine's UMLS database. FiiuiUy, any anno- 
tation strategy mu^ be des^ned to support an 
evolving nomendature and raindly aqfunding 
knowledgebase. 

Even if it tdkes aa extended period of 
time to annotate the new gmome data more 
caiefiilly and completely now, it will sorely 
be more cost efifective than redoing it later. 
Recall that the correcting and/or updating of 
all of the historical data in largely archival 
sequence databases such as GenBank or 
SWISS-PROT, has not yet been completed— 
probably for good reasons of cost and time. 
We in the basic research and biotechnology 



our impatience for the new data degrade its 
annotation and longer-term otility. 

1. Nmt. BJ., SctinijcK, CJ, NafflbudrtpMl. R. and 
Srnrtt<J.F.ig94.nwan:Mn«ulatey-pnMhlknilly 
Ot MCMpaat pttMns. AMurs 371^7-300. 
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Evidence Appendix G 

Brenner (TIG 15, 4:132-133, April 1999) 

This was cited by the Examiner in the Office Action mailed on January 25, 2007. 
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Outlook GENOME ANALYSIS 



Errors in genome annotation 



Errors in genome annotation 



At the time that Watson and Crick proposed a structure 
for DNA, a visionary might have suggested that the 
complete genetic sequence of an oi^nism would eventu- 
ally be known. However, nobody could have realistically 
proposed that machines could automatically indicate gene 
functions. Yet precisely this has been achieved: with no 
laboratory experiments at all, the roles of most genes in 
several organisms have been reported. 

But how reliable are these functional assignments, upon 
which we depend for understanding genes and genomes? 
Without laboratory experiments to verify the compu- 
tational methods and their expert analysis, it is impossible 
D know for certain, fiowever, a simple procedure can 
place a rough upper bound on their accuracy. I have com- 
pared three different groups' functional annotation'-^ for 
the Mycoplasma genitalium genome' (Fig. 1). Where rwo 
groups' descriptions are completely incompatible, at least 
r. In my analysis, there is no penalty 
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Three dots represent (left to rigtit) Fnasier si al.'. Koonm el aL' and Ouzaunis e 
3l.' annotations for eaott of the 468 U. eemtsliiim genes. (Tentative case 



ally, while cwtflictini annotations are in dlftennt colors. It is unknown 
If any. . of the annotations are actually oorrect There are 3D(1 cases 
His H a/.) simply repoited the SWS&mT annotation of the same 
:ed by cotored open cii!Cles.:B«cause Frasier ai^/.' : 




for vague or absent functional assignment. Furthermore, I 
always assume that as many groups as possible have the 
right description (Fig. 2). 

The results are disappointing for those expecting reliable 
annotation (Table 1). M. genitalium was reponed to have 
just 468 genes, many of which are fundamental for all life and 
therefore easy to analyse. Nonetheless, the error rate is at 
least 8% for the 340 genes annotated by two or three groups. 
This value may not be uniform across the three groups, nor 
does it reflect the overall significance of a group's results. 
Genes annotated by only one group were not considered, 
but include such improbable bacterial functions as B-cell 
enhancing factor, mitochondrial polymerase, and seretonin 
receptor. This analysis cannot detect those cases where 
multiple groups arrived at consistent but wrong conclu- 
sions - a likely occurrence because all relied on similar 
methods and data. This evaluation also ignores minor dis- 
agreements in annotation, and disparities in degree of 
specificity (possibly indicating problematic overprediction 
of function'). Therefore, the true error rate must be 
greater than these figures indicate. 

There are several possible reasons why the functional 
analyses have mistakes, as described at greater length else- 
where* For example, it may be that the similarity 
between the genomic query and database sequence is 
insufficient to reliably detect homology, an issue solvable 
by appropriate use of modern and accurate sequence com- 
parison procedures'-'". A more difficult problem is accurate 
inference of function from homology. Typical database 
searching methods are valuable for finding evolutionarily 
related proteins, but if there are only about 1000 major 
superfamilies in nature"''^ then most homologs must 
have different molecular and cellular functions. 

The annotation problem escalates dramatically beyond 
the single genome, for genes with incorrect functions are 
entered into public databases'. Subsequent searches 
against these databases then cause errors to propagate to 
iignments. The procedure need cycle 



>nly a few 



IS before 



iible 



that made computational funaioi 
- the annotation databases - are so polluted as to be 
almost useless. To prevent errors from spreadmg out of 
control, database curation by the scientific community 
will be essential' '^. 

To ensure that databases are kept usable, the mtent of a 
gene annotation should be dear; does it indicate homolog, 
ortholog, and/or functional equivalence? Fortunately, some 
databases already incorporate this information explicitly 
(e.g. Ref. 14). Errors will, of course, still creep in. To help 
eliminate the collateral damage, computational assign- 
ments should clearly be flagged as such, and they should 
also indicate their source (which would allow propagation 
of corrections) and a measure of confidence in their accu- 
racy. This will require new research and development in 
algorithms and databases, and a broad commitment to 
maintaining these resources. In shore, the accessible docu- 
mentation needed for reproducibility of a computational 
function determination should be commensurate with that 
for a corresponding laboratory bench experiment. 
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Errors in genome annotation 



COMMENT Outlook 



ta) 

mg«n 



Fnalecatal « DNApAnnBglnaE) 
Koonihatal. • DNApHmawWruacA 
Oua>unis«(/ii: • DNAprtRitMtEOZTJ.-]: 




(a) Canbtml annotiHoiii. Annotatini^ were pnerai^ amsiuenil eoBisteht fgrlhis anj^itif ather the functiw or tM gene namt MM (e.{. (iig463; miOlO). 
An Hception is when gm pMP ines ■ gem nima and anotlw ipecfflcdtr notei that tte cumt gene is i petalog and net identical (consider mgOlO). Where the 
descriptions froii different groups were canpatibie, but i# different lenis of specifics^, f hit was considered a correct assignment (e.g. mg22S). The difficulty of 
reconcning pairs of descriptions to determine wtielher they reflect compatible functions m^s this analysis imprecise. Generally, the approach hens Is generous 
end should en on the side detecting too few ernxsi it is usuaBy more permissNe then Ref. & MfW): Frasier ef a/.> and Koonin ef al.' describe different aspects 
of fvncHon, biitinw the same gene name. 1he:Oiizounls>r:al> desertion Is cos^atlbi^witblM fiOm Koonin eta/.', but less specific. All thiee annotetims are 
considered coned for this analysis. mgOlO: Frasier etaH' and Oumun^ <f at^agies eHi this is a„DIU primase. Kaonin ef a).< »e a different gene name and 
expllcHly state thet this is a truncated protein. Because of tbe comnnn Mtiond )tes(!ripBi)nli,:alliiiQe are considefed l»rrecL However, it Koonin ef had been 
moieeipiiGitinindicat1ngafdnc«ohaidiflkrence,tlianthelrannotMoawoiiMI»>ebeeeurM 
by all three groi^) iVlSh the (hawnis a^8l^>annotatio^i of histi«^^ 

be thai histidine permease is an Oaeonad] oMrpredicUoh of fundloii.or it uiiid be iioried. THe tM annotations ai^isonsidewi consiM; and the dadsionidf 
Frasier ittl.' nottopnwideafunctlon is ngt penaUzed-MlmaiistsMmintattoiite*^ penaiiod. 
The Koonin «f a/.' and Ouzounis ef «/.^ annatations are whdIly httanMsteat. Thislleadsto ii odaflict mil:* miniiiiun emf rata of a%..ltot» that «Mr assessment 
methodology also behaves correctly When two annotators pnivNie:dN!h!«nt;,fanctioiis for a Bi;W,fii|i(lioiid enipn^; edcK S^tte uMAirs is tMf rfght and helf: 
wrong, end the assessment assigns 1 50% error rate, iii(44l;:ptasier«taf? and CNiaMmisefslI'M 

- - ■ Ion is eaWwii-eCiii Mi iiiii l ii > i H iii T ii - i m i B iii <m i ii| iii Ji»»#i < teim M lm . i M ft m m 



es putative, this set of annotations is ri^ on the threshoH at (onsitealhiR. For this aaaMs; tlie ISoonhi etil.'- annotate was conaidered hi be in conflict with 
the others, giving a eilnimiim ernr rate of 33%, mggifc aa three gws provide contmMqr ftinetions. Tl» f iinctiofl dasKliied liy Ftasler «t tlf of HIiN-CdA 
reductase is £C 1.1.L34, while the lUDH-uhlquinone osidoreductasa anndated by Ouzlunis ^ alL' (nii6al,Mrpot is EC' l^Si: Nefther eniyme uses ATP or GTP, 
as specified by Xbonin ef The anriysis assumes one is coned and madis two hiconecti Note: OiBwnis «r«f|iinota|ions etmhialeMto SWISS-PROT included in 
these examples are nd included ill the Table 1 analysis; 
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Evidence Appendix H 

Borks (TIG12, 10:425-427, October 1996) 

Tills was cited by the Examiner in the Office Action mailed on January 25, 2007. 
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Evidence Appendix I 

Alignment of LKR domains wtiich is an alignment of the LKR domains of the plant 
bifunctional LKR/SDH proteins from Arabidopsis (SEQ ID N0:1 12), corn (SEQ ID 
NO:122, encoded by SEQ ID NO:120) and soybean (SEQ ID N0:121) and the 
monofunctional lysine-forming SDH proteins from S.cerevisiae (gi:453184), 
C.albicans (gi:1 170847) and Y.lipolytica (gi: 173262). 

This was submitted as Appendix B which accompanied the Response submitted 
on July 20, 2007 and entered by Examiner in Office Communication dated l\^arch 
5, 2008. 
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Evidence Appendix J 

Comparison of the SDH domains of tfie bifunctional plant LKR/SDH proteins from 
Arabidopsis (SEQ ID N0:112), com (SEQ ID NO:122, encoded by SEQ ID 
NO:120) and soybean (SEQ ID NO:121) and the monofunctional glutamate- 
forming SDH protein from S.cerevisiae (gi:729968). 

This was submitted as Appendix C which accompanied the Response submitted 
on July 20, 2007 and entered by Examiner in Office Communication dated 
March 5, 2008. 
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Evidence Appendix K 

This is an alignment of the plant bifunctional proteins from Arabidopsis, corn and 
soybean, SEQ ID N0s:112, 122 and 121, respectively. 

This was submitted as Appendix D which accompanied the Response submitted 
on July 20, 2007 and entered by Examiner in the Office Communication dated 
IVIarch 5, 2008. 



EVIDENCE APPENDIX K 



Appendix D 



Appendix D shows a comparison of the amino acid sequences of the bifunctional 
LKR-SDH proteins from Arabidopsis, corn and soybean, SEQ ID NOs: 1 12, 122 and 121, 
respectively. Amino acids conserved among at least two plant sequences are indicated 
with an asterix (*) on the top row; dashes are used by the program to maximize the 
alignment of the sequences. The LKR and SDH domains (boxed sequences) were 
identified by Epelbaum et al. (Plant Mol. Biol. 35:735-748 (1997)) and Tang et al. (Plant 
Cell 9:1305-1316(1997)) 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



(4NSNGHEEEKKLGNGWGILSETVNKWERRTPLTPSHCARLLHGG-KDRTGISRIWQPS 
CARLLLGGGKNGPRVNRIIVQPS 



SEQ ID NO: 112 IAKRIHHDALYEHVGCEISDDLSDCGLILGIKQPELEMILPERAYAFFSHTHKAQKENMPI 
SEQ ID NO: 122 [tRRIHHDAQYEDAGCEISEDLSECGLIIGIKQPKLQMILSDRAYAFFSHTHKAQKENMPI 
SEQ ID NO: 121 



SEQ ID N0:112 
SEQ ID NO: 122 
SEQ ID N0:121 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



LDKILSERVTLCDYELIVGDHGKRLLAFGKYAGRAGLVDFLHGLGQRYLSLGYSTPFLSL 
LDKILEERVSLFDYELIVGDDGKRSLAFGKFAGRAGLIDFLHGLGQRYLSLGYSTPFLSL 



LKR dixaain 



SASYMYSSLAAAKAAVISVGEEIASQGLPLGICPLVFVFTGTGNVSLGAQEIFKLLPHTF 
SQSHMYPSLAAAKAAVIWAEEIATFGLPSGICPIVFVFTGVGNVSQGAQEIFKLLPHTF 



SEQ ID NO: 112 \/EPSKLPELFVKDKGISQNGISTKRVYQVYGCIITSQDMVEHKDPSKSFDKADYYAHPEH 
SEQ ID NO: 122 TOAEKLPEIF-QARNLSKQSQSTKRVFQLYGCWTSRDIVSHKDPTRQFDKGDYYAHPEH 
SEQ ID NO: 121 I EPKOHVI VFDKADYYSHPEH 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



YNPVFHEKISPYTSVLVNCMYWEKRFPCLLSTKQLQDLTKKGLPLVGICDITCDIGGSIE 
YTPVFHERIAPYASVIVNCMYWEKRFPPLLNMDQLQQLMETGCPLVGVCDITCDIGGSIE 
YNPTFHEKIAPYASVIVNCMYWEKRFPQLPSYKQMQDLMGRGSPLVGIADITCDIGGSIE 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



FVNRATLIDSPFFRFNPSNNSYYDDMDGDGVLCMAVDILPTEFAKEASQHFGDILSGFVG 
FINKSTSIERPFFRYDPSKNSYHDDMEGAGWCLAVDILPTEFSKEASQHFGNILSRLVA 
FVNRGTSIDSPFFRYDPLTNSYHDDMEGNGVICLAVDILPTEFAKEASQHFGNILSQFW 



SEQ ID NO: 112 SLASMTEISDLPAHLKRACISYRGELTSLYEYIPRMRKSNPE SAQDNIIANGVSSQRTFN 



EVIDENCE APPENDIX K 



SEQ ID NO: 122 
SEQ ID N0:121 



SLASVKQPAELPSYLRRACIAHAGRLTPLYEYIPRMRNTMIDLAPAK — TNPLPDKK-YS 
NLASATDITKLPAHLRRACIAHKGVLTSLYDYIPRMRSSDSEEVSENA-ENSLSNKRKYN 



SEQ ID NO: 112 ILVSLSGHLFDKFLINEALDMIEAAGGSFHLAKCELGQSADAESYSELEVGADDKRVLDQ 
SEQ ID NO: 122 TLVSLSGHLFDKFLINEALDIIETAGGSFHLVRCEVGQSTDDMSYSELEVGADDTATLDK 
SEQ ID NO: 121 ISVSLSGHLFDQFLINEALDIIEAAGGSFHLVNCHVGQSIEAVSFSELEVGADNRAVLDQ 



SEQ ID NO: 112 IIDSLTRLANPNEDYISPHREANKISLKIGKVQQ- 
SEQ ID NO: 122 IIDSLTSLANEHGGDHDAGQEIE-LALKIGKVNEYETDVTIDK(((GPK 

SEQ ID NO: 121 IIDSLTAIASPTEHDRFSNQDSSKISLKLGKVE- 



■ENEIKEKPI IMTKKSGVLILGAGRVC 



-ILILGAGRVC 

-ENGIEKESDfRKKAAVLILGAGRVC 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



RPAADFLASVRTISSQQWYKTYFGADSEEKTDVHVIVASLYLKDAKETVEGISDVEAVRL 

RPAAEFLASYPDICT YGVDDHDADQIHVIVASLYQKDAEETVDGIENTTATQL 

aPAAEMLSSFGRPSSSQWYKTLLEDDFECQTDVEVIVGSLYLKDAEQTVEGIPNVTGIQL 



SEQ ID N0:112 
SEQ ID NO: 122 
SEQ ID NO: 121 



SEQ ID N0:112 
SEQ ID NO: 122 
SEQ ID N0:121 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



SEQ ID N0:112 
SEQ ID NO: 122 
SEQ ID N0:121 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID N0:121 



SEQ ID N0:112 
SEQ ID NO: 122 
SEQ ID NO: 121 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



DVSDSESLLKYVSQVDWLSLLPASCHAWAKTCIELKKHLVTASYVDDETSMLHEKAKS 
DVADIGSLSDLVSQVEWISLLPASFHAAIAGVCIELKKHMVTASYVDESMSNLSQAAKq 
DVMDRANLCKYI SQVDWI SLLPPSCHI IVANACIELKKHLVTASYVDSSMSMLNDKAKD 



^GITILGEMGLDPGIDHMMAMKMINDAHIKKGKVKSFTSYCGGLPSPAAANNPLAYKFSW 
AGVTILCEMGLDPGIDHLMSMKMIDEAHARKGKIKAFTSYCGGLPSPAAANNPLAYKFSW 
^GITILGEMGLDPGIGHMMAMKMINQAHVRKGKIKSFTSYCGGLPSPEAANNPLAYKFSW 



>IPAGAIRAGQNPAKYKSNGDIIHVDGKNLYDSAARFRVPNLPAFALECFPNRDSLVYGEH 
JIPAGALRSGKNPAVYKFLGETIHVDGHNLYESAKRLRLRELPAFALEHLPNRNSLIYGDI 
^PAGAIRAGRNPATYKWGGETVHIDGDDLYDSATRLRLPDLPRFALECLPNRNSLLYGDI 



YGIESEATTIFRGTLRYEGFSMIMATLSKLGFFDSEANQVLSTGKRITFGALLSNILNKD 
fGISKEASTIYRATXRYEGFSEIMVTLSKTGFFDAANHPLLQDTSRPTYKGFLDELLNNI 
YGI-TEASTIFRGTLRYEGFSEIMGTLSRISLFNNEAHSLLMNGQRPTFKKFLFELLKW 



ADNESEPLAG EEEISKRIIKLGHSKE— TAAKAAKTIVFLGFNEEREVPSLCKSV^ 

STINTDLDIEASGGYDDDLIARLLKLGCCKNKEIAVKTVKTIKFLGLHEETQIPKGCSS0 
GDNPDELLIG ENDIMEQILIQGHCKDQRTAMETAKTIIFLGLLDQTEIPASCKSjJ 



FDATCYLMEEKLAYSGNEQDMVLLHHEVEVEFLESKRIEKHTATLLEFGDIKNGQTTTA] 
FDVICQRMEQRMAYGHNEQDMVLLHHEVEVEYPDGQPAEKHQATLLEFGKVENGRSTTA] 
FDVACFRMEERLSYTSTEKDMVLLHHEVEIEYPDSQITEKHRATLLEFGKTLDEKTTTAJ 



^KT VG I P AA I GALVL I E DK I KTRGVLRPLEAE VY LPALDIL Q— 

ALTVGIPAAIGALLLLKNKVQTKGVIRPLQPEIYVPALEILESSGIKLVEKVET- . KFPE 
ALTVGIPAAVGALLLLTNKIQTRGVLRPIEPEVYNPALDII E— 



EVIDENCE APPENDIX K 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



SEQ ID NO: 112 
SEQ ID NO: 122 
SEQ ID NO: 121 



--AY GIKLME KAE.- 

TQIKI-V.YSRAHVSFVLTPFWNI-YL.-TKM.QIKRTGGVYCKRRQRNLCIYDLSISN^ 
AY GIKLIE KT-E- 
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